Explaining OpenAI Sora’s Spacetime Patches: The Key Ingredient
Author
Created
Feb 23, 2024 03:52 AM
Tags
technology
Type
文章
Date
Feb 15, 2024
Content
How Sora’s Unique Approach Transforms Video Generation
In the world of generative models we have seen a number of approaches from GAN’s to auto-regressive, and diffusion models, all with their own strengths and limitations. Sora now introduces a paradigm shift with a new modelling techniques and flexibility to handle a broad range of duration's, aspect ratios, and resolutions.
Sora combines both diffusion and transformer architectures together to create a diffusion transformer model and is able to provide features such as:
• Text-to-video: As we have seen
• Image-to-video: Bringing life to still images
• Video-to-video: Changing the style of video to something else
• Extending video in time: Forwards and backwards
• Create seamless loops: Tiled videos that seem like they never end
• Image generation: Still image is a movie of one frame (up to 2048 x 2048)
• Generate video in any format: From 1920 x 1080 to 1080 x 1920 and everything in between
• Simulate virtual worlds: Like Minecraft and other video games
• Create a video: Up to 1 minute in length with multiple shorts