Clear glass has a reputation as the hardest thing in product photography, and it is earned. Glass reflects the whole room, it shows whatever is behind it, and it bends light into refractions and caustics that are almost impossible to control. Photographers build entire careers on it. Even AI editing tools choke on it, because they cannot tell an unwanted softbox reflection from the internal refraction that gives glass its three-dimensional shape, and they wipe out the highlights that make it read as glass at all.
So I expected from-scratch generation to fail outright. It did not. I ran one brief, a clear stemmed wine glass half-filled with red wine on a dark wood table by a window, with refraction through the bowl and caustics on the table, through four of the strongest image models with the same prompt: Nano Banana 2, GPT Image 2, Seedream 4.5, and FLUX.2 Pro. All four produced believable clear glass with real optical behavior. The interesting part is that they did not fail or all win the same way: each one nailed a different piece of the physics. This is the glassware entry in our product-photography series, alongside the skincare, jewelry, supplements, makeup, food and beverage, footwear, candles, clothing, furniture, electronics, handbags, sunglasses, flowers, watches, perfume, packaging, pet products, toys, textiles, cookware, stationery, drinkware, soap, ceramics, art prints, earbuds, houseplants, knives, and automotive wheels tests and the broader best AI image model for product photography roundup.
Quick answer
- Best projected caustics: GPT Image 2. The most convincing focused light cast through the wine onto the table.
- Best in-glass refraction, and cheapest photoreal: Seedream 4.5. The wine translucency, meniscus, and thin rim, in a premium macro.
- Best table reflection: FLUX.2 Pro. A clean mirror reflection of the glass on the polished surface.
- Most complete scene: Nano Banana 2. Background refraction, visible caustics, and a believable full glass.
If you only remember one thing: glass is no longer the impossible case for from-scratch AI. The models all render real optics now, they just emphasize different ones, so pick by the effect your shot is built around.
The test, model by model
One brief, four models, same prompt. I judged the optics first, refraction, caustics, reflection, then the wine and the glass itself.
GPT Image 2 rendered the most physically impressive effect: a real caustic. The window light passes through the bowl and the wine and lands on the table as a focused red pattern, exactly what happens with real glass and a colored liquid. That projected light, plus the refracted hotspot in the glass's shadow, is the single most convincing optical detail in the test. If your shot is built around dramatic light through the glass, this is the one.
Seedream 4.5 made the most beautiful close-up, and on glass that means the most convincing in-glass refraction. The wine has real translucency and depth, the meniscus where it meets the bowl is believable, the rim is rendered thin with a clean specular highlight, and the window bends through the glass wall. At the lowest cost of the photoreal options, it is the best macro here. Its one liberty was the shape: it produced a coupe rather than the stemmed wine glass, a reminder that a prompt is a style, not your SKU.
FLUX.2 Pro nailed the reflection. The stemmed glass mirrors cleanly on the polished tabletop, the wine glows where light transmits through it, and the glass kept the briefed shape. It is the most classic, on-brief hero of the four, elegant and clean, with the reflection doing the premium work. Its caustics are subtler than GPT's, but for a straightforward catalog hero of a wine glass, it is the most usable result at the lowest price.
Nano Banana 2 gave the most complete, balanced result: a full window-lit scene where the background refracts through the glass, a warm red caustic falls on the table by the base, and the wine, meniscus, and stem all read believably. It is not the single best at any one optic, but it is the most usable as a finished, in-context shot, which is often what a catalog actually needs.
The comparison
| Model | Caustics | In-glass refraction | Reflection | Kept the shape | Rough cost/image |
|---|---|---|---|---|---|
| GPT Image 2 | Best, vivid red | Good | Moderate | Stemmed (cropped) | ~26.4 credits |
| Seedream 4.5 | Not shown (macro) | Best, rim + meniscus | Subtle | Changed to a coupe | ~4.8 credits |
| FLUX.2 Pro | Subtle | Good, luminous wine | Best, clean mirror | Stemmed (on brief) | ~3.6 credits |
| Nano Banana 2 | Good, warm red | Good | Moderate | Stemmed (on brief) | ~9.3 credits |
Credit costs are first-hand from this test on Masonry; per-image rates move, so check current pricing.
Why glass is no longer the impossible case
The reputation of glass comes from photographing real glass, where reflections and refractions fight your lighting. From-scratch generation is a different problem, and it turns out to be one these models handle well.
The physics is there now. Every model produced real optical behavior, background bending through the bowl, light landing on the table as a caustic, a believable meniscus, rather than a flat glass-shaped object with painted highlights. The "AI cannot do glass" assumption is based on the editing-tool failure, where the software cannot tell a reflection from a refraction. Generating from scratch sidesteps that, because the model is inventing a consistent optical scene rather than trying to surgically remove one effect from a photo.
The split is the useful finding. What separates the models is not whether they can do glass, but which optic they emphasize: GPT projects the best caustic, Seedream renders the best in-glass refraction, FLUX the best reflection, Nano the most complete scene. That is a more actionable result than a single ranking, because a barware brand shooting a hero with dramatic light wants different optics than one shooting a clean catalog reflection. Match the model to the effect.
Shape is still not guaranteed. Seedream's coupe is the reminder that even when the optics are right, the exact glass is a style the model chose, not your SKU. For your real glassware, the bowl shape and proportions are the design, so generate from a reference photo of the actual piece.
How to shoot your glassware line without a studio
The workflow is the roundup approach, tuned for a product whose whole difficulty is light. Decide what your shot is about, dramatic caustics, a clean reflection, a translucent close-up, and pick the model that does that optic best. Run two and compare the light, not just the glass. And for the actual SKU, feed a reference photo so the exact bowl and stem are yours rather than a plausible invention.
With the Masonry CLI you can fire the same glassware prompt at every model and compare the optics side by side, or pass your real glass as a reference to keep the exact shape:
masonry image "clear stemmed wine glass with red wine on a dark table, window light, refraction and caustics, photoreal" --model gpt-image-2 masonry image "place this exact glass on a polished bar with a clean reflection, photoreal" --ref ./real-glass.png --model flux-2-pro
The bottom line
Glassware turned out not to be the failure I expected. All four models render real optics now, so the choice is about which effect your shot is built around: GPT Image 2 for projected caustics, Seedream 4.5 for the best in-glass refraction at the lowest cost, FLUX.2 Pro for a clean reflection, Nano Banana 2 for the most complete scene. Judge the light rather than the silhouette, and use a reference photo when the exact glass has to be yours. See how the same fidelity-first logic plays out across every product type in our best AI image model for product photography roundup, or run your own glassware from one place with the Masonry CLI.


