Explaining OpenAI Sora’s Spacetime Patches: The Key Ingredient

Author

Created

Feb 23, 2024 03:52 AM

Tags

technology

Type

文章

Date

Feb 15, 2024

Content

How Sora’s Unique Approach Transforms Video Generation In the world of generative models we have seen a number of approaches from GAN’s to auto-regressive, and diffusion models, all with their own strengths and limitations. Sora now introduces a paradigm shift with a new modelling techniques and flexibility to handle a broad range of duration's, aspect ratios, and resolutions. Sora combines both diffusion and transformer architectures together to create a diffusion transformer model and is able to provide features such as: • Text-to-video: As we have seen • Image-to-video: Bringing life to still images • Video-to-video: Changing the style of video to something else • Extending video in time: Forwards and backwards • Create seamless loops: Tiled videos that seem like they never end • Image generation: Still image is a movie of one frame (up to 2048 x 2048) • Generate video in any format: From 1920 x 1080 to 1080 x 1920 and everything in between • Simulate virtual worlds: Like Minecraft and other video games • Create a video: Up to 1 minute in length with multiple shorts

media

Explaining OpenAI Sora’s Spacetime Patches: The Key Ingredient

Under The Hood Of The Generative AI For Video By OpenAI

https://medium.com/towards-data-science/explaining-openai-soras-spacetime-patches-the-key-ingredient-e14e0703ec5b