Meta works on Emu Video and Emu Edit: Generative AI tricks for GIFs, photos and 4-second videos
Meta is announcing through a blog post that they’re busy working on new research into “controlled image editing based solely on text instructions and a method for text-to-video generation based on diffusion models”.
Which, in simpler words, means they want to put in Facebook and Instagram generative AI tools. The projects Meta is developing are called Emu Video and Emu Edit.
This tool, as the name suggests, is for generating video. Meta describes it as “a simple method for text-to-video generation based on diffusion models”. Emu Video should respond to a variety of inputs: text only, image only, and both text and image. The process is split into two steps, Meta clarifies: first, generating images conditioned on a text prompt, and then generating video conditioned on both the text and the generated image.
This one should allow “precise image editing” via recognition and generation tasks. Like Meta says, the use of generative AI is often a process, not a single task.
“Emu Edit is capable of free-form editing through instructions, encompassing tasks such as local and global editing, removing and adding a background, color and geometry transformations, detection and segmentation, and more. Current methods often lean towards either over-modifying or under-performing on various editing tasks. We argue that the primary objective shouldn’t just be about producing a ‘believable’ image. Instead, the model should focus on precisely altering only the pixels relevant to the edit request. Unlike many generative AI models today, Emu Edit precisely follows instructions, ensuring that pixels in the input image unrelated to the instructions remain untouched. For instance, when adding the text ‘Aloha!’ to a baseball cap, the cap itself should remain unchanged”, says the Meta team.
The road ahead is definitely AI-driven for Meta.
“Although this work is purely fundamental research right now, the potential use cases are clearly evident. Imagine generating your own animated stickers or clever GIFs on the fly to send in the group chat rather than having to search for the perfect media for your reply. Or editing your own photos and images, no technical skills required. Or adding some extra oomph to your Instagram posts by animating static photos. Or generating something entirely new”, the blog post concludes.
Which, in simpler words, means they want to put in Facebook and Instagram generative AI tools. The projects Meta is developing are called Emu Video and Emu Edit.
What is Emu Video?
This tool, as the name suggests, is for generating video. Meta describes it as “a simple method for text-to-video generation based on diffusion models”. Emu Video should respond to a variety of inputs: text only, image only, and both text and image. The process is split into two steps, Meta clarifies: first, generating images conditioned on a text prompt, and then generating video conditioned on both the text and the generated image.
Our state-of-the-art approach is simple to implement and uses just two diffusion models to generate 512x512 four-second-long videos at 16 frames per second.
What is Emu Edit?
This one should allow “precise image editing” via recognition and generation tasks. Like Meta says, the use of generative AI is often a process, not a single task.
“Emu Edit is capable of free-form editing through instructions, encompassing tasks such as local and global editing, removing and adding a background, color and geometry transformations, detection and segmentation, and more. Current methods often lean towards either over-modifying or under-performing on various editing tasks. We argue that the primary objective shouldn’t just be about producing a ‘believable’ image. Instead, the model should focus on precisely altering only the pixels relevant to the edit request. Unlike many generative AI models today, Emu Edit precisely follows instructions, ensuring that pixels in the input image unrelated to the instructions remain untouched. For instance, when adding the text ‘Aloha!’ to a baseball cap, the cap itself should remain unchanged”, says the Meta team.
The potential use cases
The road ahead is definitely AI-driven for Meta.
“Although this work is purely fundamental research right now, the potential use cases are clearly evident. Imagine generating your own animated stickers or clever GIFs on the fly to send in the group chat rather than having to search for the perfect media for your reply. Or editing your own photos and images, no technical skills required. Or adding some extra oomph to your Instagram posts by animating static photos. Or generating something entirely new”, the blog post concludes.
Things that are NOT allowed: