Google just shipped Gemini Omni Flash — its fastest, most cost-efficient video model — alongside Nano Banana 2 Lite. It turns a sentence into a 10-second clip with sound in about half a minute, and it does image-to-video too. It's live on Masonry today.
Here's what it's for, where it sits next to Veo, and the work it's genuinely good at.
What Gemini Omni Flash actually is
Omni Flash is the speed-and-cost tier of Google's video lineup. Google's framing is direct: it's a "high quality, cost-efficient model for video generation and conversational editing," priced at $0.10 per second of output — the same as Veo 3.1 Fast (blog.google).
What that buys you in practice: a 10-second, 720p clip with a native audio track, generated in roughly 30 seconds. It reads a plain text prompt, or takes an image and animates it. Everything below was generated on Omni Flash — no edits, no upscaling, just the prompt. Tap the unmute button on any clip to hear the audio.
The give-away is the motion and the light — a believable shallow depth of field, warm directional sunlight, and natural movement, held together across the whole clip from a single line of text.
Fast enough to actually iterate
The reason a fast, cheap video model matters isn't the headline number — it's the loop. At ~30 seconds and a low per-clip cost, you can try a direction, look at it, and try another before you've lost the thread. That's the difference between "generate and hope" and actually art-directing.
Close-up product and lifestyle b-roll is where this lands cleanly — the kind of short, atmospheric motion clip that fills a landing page, a social tile, or an ad cut.
And it holds up on bigger, motion-heavy scenes too — an aerial pass with real camera movement, water, and light, again in a single fast generation.
Image-to-video: bring a still to life
Omni Flash isn't just text-to-video. Hand it an image and a short instruction, and it animates the scene — the model's conversational-editing side. It's the fast way to add motion to a still you already have.
Where it fits: Omni Flash vs Veo
Think of it as a draft-to-final ladder for video:
- Gemini Omni Flash — fastest, cheapest, 10s / 720p, with ambient audio. Reach for it when you're drafting, exploring directions, or generating short clips in volume.
- Veo 3.1 Fast / Veo 3.1 — the higher-fidelity workhorses. Reach for them when you need spoken dialogue and lip-sync (including Hindi and other languages), longer or vertical clips, or hero-shot quality.
- Kling — silent cinematic motion, longer durations, and per-object/camera control when you need precise direction.
The healthy workflow is to draft on Omni Flash and finalize a tier up. Because Masonry keeps every model on one hub, moving from an Omni Flash draft to a Veo final — or from a still image to a video — is a one-click step, not a tool switch.
The honest caveats
It's a preview, and it shows its edges. Output is a fixed ~10 seconds at 720p (16:9) — duration, aspect ratio, and resolution aren't controllable through the API yet. The audio is ambient and atmospheric, not precise spoken dialogue or lip-sync — for talking-head and dialogue work, use Veo 3.1. If you need vertical (9:16), longer clips, or higher resolution today, reach for Veo or Kling. Within those limits, it's hard to beat on speed-per-dollar.
How to use it on Masonry
It's live now. Two ways in:
- Pick it directly. Choose Gemini Omni Flash in the model picker and prompt it in plain language. Attach an image to animate it.
- Just ask. Tell the Masonry agent you want a fast or low-cost video draft and it'll route to Omni Flash, then help you step up to Veo when you're ready to finalize.
The verdict
Gemini Omni Flash is the model to reach for when speed and cost matter more than squeezing out every last frame of fidelity — which describes most of the drafting, iterating, and short-clip work that fills a real day. It gives you motion and sound in about thirty seconds at the lowest price in the lineup. Draft on Omni Flash, finalize a tier up, and ship faster. Try it on Masonry.


