What is Gemini Omni Flash?

It's Google's fastest, most cost-efficient video model. It generates a 10-second 720p clip with native audio in about 30 seconds, and handles both text-to-video and image-to-video. Google prices it at $0.10 per second of output — the same rate as Veo 3.1 Fast.

How is it different from Veo on Masonry?

Omni Flash is the fast, cheap tier — reach for it when you're drafting, exploring, or generating in volume. Veo 3.1 is the higher-fidelity workhorse for hero shots, and the right choice when you need dialogue-grade lip-sync, multiple languages, or resolution above 720p. Draft on Omni Flash, finalize on Veo.

Does it generate audio?

Yes — clips come with a native audio track (ambient sound and effects). In the examples below, tap the unmute button on any video to hear it. It is not built for precise spoken dialogue or lip-sync; use Veo 3.1 for talking-head and dialogue work.

What are the limits right now?

Output is a fixed ~10 seconds at 720p (16:9), and duration/aspect ratio aren't yet controllable through the API. Those are preview constraints Google is expected to relax over time. For longer, vertical, or higher-resolution video today, use Veo or Kling.

How do I use it on Masonry?

Pick Gemini Omni Flash from the model picker, or ask the Masonry agent for a fast or low-cost video draft. It takes a plain text prompt, and you can attach an image to animate it (image-to-video).

Introducing Gemini Omni Flash: Fast, Affordable AI Video on Masonry

Google just shipped Gemini Omni Flash — its fastest, most cost-efficient video model — alongside Nano Banana 2 Lite. It turns a sentence into a 10-second clip with sound in about half a minute, and it does image-to-video too. It's live on Masonry today.

Here's what it's for, where it sits next to Veo, and the work it's genuinely good at.

What Gemini Omni Flash actually is

Omni Flash is the speed-and-cost tier of Google's video lineup. Google's framing is direct: it's a "high quality, cost-efficient model for video generation and conversational editing," priced at $0.10 per second of output — the same as Veo 3.1 Fast (blog.google).

What that buys you in practice: a 10-second, 720p clip with a native audio track, generated in roughly 30 seconds. It reads a plain text prompt, or takes an image and animates it. Everything below was generated on Omni Flash — no edits, no upscaling, just the prompt. Tap the unmute button on any clip to hear the audio.

Text-to-video from one sentence: 'A golden retriever puppy bounding through a sunlit meadow of wildflowers, slow motion, shallow depth of field, warm cinematic light.'

The give-away is the motion and the light — a believable shallow depth of field, warm directional sunlight, and natural movement, held together across the whole clip from a single line of text.

Fast enough to actually iterate

The reason a fast, cheap video model matters isn't the headline number — it's the loop. At ~30 seconds and a low per-clip cost, you can try a direction, look at it, and try another before you've lost the thread. That's the difference between "generate and hope" and actually art-directing.

Lifestyle b-roll: a barista pouring steamed milk into a latte, warm morning light in a cozy cafe, close-up, cinematic — with ambient audio.

Close-up product and lifestyle b-roll is where this lands cleanly — the kind of short, atmospheric motion clip that fills a landing page, a social tile, or an ad cut.

Aerial nature shot: a drone pass over ocean waves crashing on a rugged coastline at golden hour, seabirds gliding, cinematic.

And it holds up on bigger, motion-heavy scenes too — an aerial pass with real camera movement, water, and light, again in a single fast generation.

Image-to-video: bring a still to life

Omni Flash isn't just text-to-video. Hand it an image and a short instruction, and it animates the scene — the model's conversational-editing side. It's the fast way to add motion to a still you already have.

Image-to-video: starting from a single still frame, prompted to 'finish the pour and let the leaf pattern settle, gentle steam rising.'

Where it fits: Omni Flash vs Veo

Think of it as a draft-to-final ladder for video:

Gemini Omni Flash — fastest, cheapest, 10s / 720p, with ambient audio. Reach for it when you're drafting, exploring directions, or generating short clips in volume.
Veo 3.1 Fast / Veo 3.1 — the higher-fidelity workhorses. Reach for them when you need spoken dialogue and lip-sync (including Hindi and other languages), longer or vertical clips, or hero-shot quality.
Kling — silent cinematic motion, longer durations, and per-object/camera control when you need precise direction.

The healthy workflow is to draft on Omni Flash and finalize a tier up. Because Masonry keeps every model on one hub, moving from an Omni Flash draft to a Veo final — or from a still image to a video — is a one-click step, not a tool switch.

The honest caveats

It's a preview, and it shows its edges. Output is a fixed ~10 seconds at 720p (16:9) — duration, aspect ratio, and resolution aren't controllable through the API yet. The audio is ambient and atmospheric, not precise spoken dialogue or lip-sync — for talking-head and dialogue work, use Veo 3.1. If you need vertical (9:16), longer clips, or higher resolution today, reach for Veo or Kling. Within those limits, it's hard to beat on speed-per-dollar.

How to use it on Masonry

It's live now. Two ways in:

Pick it directly. Choose Gemini Omni Flash in the model picker and prompt it in plain language. Attach an image to animate it.
Just ask. Tell the Masonry agent you want a fast or low-cost video draft and it'll route to Omni Flash, then help you step up to Veo when you're ready to finalize.

The verdict

Gemini Omni Flash is the model to reach for when speed and cost matter more than squeezing out every last frame of fidelity — which describes most of the drafting, iterating, and short-clip work that fills a real day. It gives you motion and sound in about thirty seconds at the lowest price in the lineup. Draft on Omni Flash, finalize a tier up, and ship faster. Try it on Masonry.