GPT Image 2 is OpenAI's image generation model built for the part of the job most models still get wrong: legible text, correct labels, and product surfaces that hold up under scrutiny. If you sell physical products and you've been burned by AI images with garbled packaging copy or shadows that float in the wrong direction, this is the model worth your time.
OpenAI shipped it on April 21, 2026 as gpt-image-2 (pinned snapshot gpt-image-2-2026-04-21), available through the API and Codex (OpenAI announcement). The short version of our take: it's slower and pricier than the alternatives, and for catalog work that tradeoff is usually worth it. You can try it on its model page and run it against other models on your own product before you commit.
The one-line answer
If your image has words in it or a real product at the center of it, use GPT Image 2. If you're making moody lifestyle scenes, abstract art, or you need 200 social variations on a tight budget, a faster and cheaper model is the smarter pick. This is a precision tool, not a volume tool.
What's actually new
OpenAI calls GPT Image 2 its most capable image model, and the headline claims are stronger text rendering, better layouts, more reliable instruction-following, and improved editing (OpenAI announcement). The model also plugs into a thinking mode when paired with reasoning models, so it plans the structure of an image before rendering it instead of generating in one shot.
The number that backs this up: OpenAI reports a 242-point lead over the field in Text-to-Image on the Image Arena leaderboard (OpenAI announcement). Leaderboards aren't the whole story, but the gap is unusually large, and it shows up in real use. Text is where this model pulls ahead.
It supports text-to-image, image editing with masks, and high-fidelity image inputs (OpenAI model docs). Worth knowing up front: streaming, function calling, structured outputs, and fine-tuning are not supported on this model (OpenAI model docs). And the training knowledge cuts off at December 2025, so a product that launched after that won't be in the model's head unless you feed it a reference image.
Why it's reliable for ecommerce
Most AI image models can paint a beautiful bottle. The problem starts when the bottle needs a real label, the label needs the right words, and the glass needs to reflect the studio lights the way glass actually does. That's the gap GPT Image 2 closes, and it's why practitioners reach for it on commercial work.
Text on products holds up
This is the single biggest reason to use it. The model renders text inside images, including signage, UI labels, posters, and handwritten notes, with correct spelling and consistent spacing (fal.ai model page). In fal's hands-on review, "product labels read like product labels," and multi-line headlines, mixed font weights, and even CJK characters came out clean on the first or second try, removing a manual touch-up step that used to be unavoidable (fal.ai review).
For ecommerce that's the whole ballgame. A skincare bottle with a readable ingredient line, a coffee bag with the roast name spelled right, a supplement label that doesn't dissolve into AI mush. You stop spending an hour in Photoshop fixing every generation.
Shadows and reflections behave
fal's reviewers ran a watchmaker's workbench test and saw physically plausible reflections on polished steel, with contact shadows matched to the lighting direction, something the previous generation struggled with (fal.ai review). For a product shot, grounded shadows and correct reflections are the difference between "looks shot in a studio" and "looks AI." A floating object with a shadow that points the wrong way reads as fake instantly, even to a casual shopper.
Editing preserves the rest of the scene
The dedicated edit endpoint with mask support does precise inpainting and outpainting (fal.ai model page). In one test the model swapped a paperback for a hardcover while leaving six other objects, their shadows, and the lighting untouched (fal.ai review). That matters for catalogs. You can change one product variant, swap a background, or fix a label without re-rolling the whole image and losing the parts you liked.
fal's own docs list product photography variations, background replacement, and packaging visualization as supported use cases, and call out commercial workflows where consistency matters (fal.ai model page). That's not marketing fluff in this case. It tracks with what the model does well.
Resolutions and pricing
GPT Image 2 supports flexible dimensions up to 4K. On fal, both edges must be multiples of 16, the maximum edge is 3,840px, the aspect ratio caps at 3:1, and total pixels fall between 655,360 and 8,294,400 (fal.ai model page). There are three quality tiers: low, medium, and high (the default).
Pricing scales with both resolution and quality. Here's the fal per-image breakdown (fal.ai model page):
| Resolution | Low | Medium | High |
|---|---|---|---|
| 1024×768 | $0.005 | $0.037 | $0.145 |
| 1024×1024 | $0.006 | $0.053 | $0.211 |
| 1024×1536 | $0.005 | $0.042 | $0.165 |
| 1920×1080 | $0.005 | $0.040 | $0.158 |
| 2560×1440 | $0.007 | $0.056 | $0.222 |
| 3840×2160 (4K) | $0.012 | $0.101 | $0.401 |
Through OpenAI's own API, the model is priced by tokens: $8.00 per 1M input tokens and $30.00 per 1M output tokens for the image modality (OpenAI announcement).
The practical read on cost: native 4K at high quality runs about $0.40 an image, which gets expensive fast at catalog scale. fal's reviewers suggest generating at low quality for cheap, then upscaling, rather than paying for 4K every time (fal.ai review). For most ecommerce work, medium quality at a 1024 or 1536 dimension hits the right balance. You're paying a few cents a shot, not forty.
Where it falls short
No model is all upside, and pretending otherwise wastes your money. Here's where GPT Image 2 will frustrate you.
It's slow. The built-in reasoning that makes it accurate also makes it noticeably slower than competitors (fal.ai review). If you're iterating quickly or batching hundreds of images, the wait adds up.
4K is overpriced. At roughly $0.40 a high-quality 4K image, it's hard to justify at scale (fal.ai review). Plan to upscale instead.
The knowledge cutoff bites. With training data through December 2025, the model won't know recent product designs or events. For a product that launched this spring, you'll need to supply a reference image or it'll guess (fal.ai review).
Character consistency isn't its strong suit. If you need the same person or mascot across a dozen images, other models hold a character better.
How it compares for product work
GPT Image 2 isn't the only good option, and the honest answer is that the right model depends on what's in the frame.
Against Nano Banana 2, GPT Image 2 wins on text precision and color fidelity, while Nano Banana 2 wins on speed, character consistency, and 4K pricing, roughly $0.16 versus $0.41 for a high-quality 4K image (fal.ai review). So for a hero shot of a labeled product, GPT Image 2. For fast 4K lifestyle scenes or a recurring character, Nano Banana 2.
Against FLUX and Midjourney, fal positions GPT Image 2 as best-in-class for text-heavy marketing creatives and UI mockups (fal.ai review). FLUX and Midjourney still produce gorgeous, stylized imagery and are often faster and cheaper, but neither matches GPT Image 2 when correct words have to land on the canvas. If your creative is a banner ad with a headline and a price, that tilts hard toward GPT Image 2.
The smart move is to not guess. Run the same product through two or three models and compare. On Masonry you can do exactly that, generate with GPT Image 2 and a cheaper or faster model side by side on your own product, and decide with your own eyes instead of someone else's leaderboard.
Prompt tips for product-accurate results
A few habits get you sharper, more usable output. These come from running product prompts, not from the docs.
Spell out the label text exactly. Put the words you want on the product in quotes and keep them short. "A matte black coffee bag with the label reading 'DARK ROAST' in bold white sans-serif" beats hoping the model invents good copy. GPT Image 2 will render what you write, so write it.
Describe the lighting and surface. Say "soft studio softbox from the upper left, glossy reflection on the bottle, soft contact shadow grounding it on a white surface." This is where the model's shadow and reflection handling earns its keep, but only if you tell it the setup.
Lead with the product, not the scene. For product marketing, the hero of the frame should be the physical product, the bottle, sneaker, can, or bag, not a lifestyle moment with the product buried in it. Put it first in the prompt and give it the center of the composition.
Use a reference image for anything recent or specific. The December 2025 cutoff means new packaging won't be known. Feed an image input and the model will match your actual product instead of approximating it.
Edit, don't re-roll. When 90% of a shot is right, use the mask edit endpoint to fix the label or swap the background. You'll keep the lighting and composition you already liked.
Start at medium quality. Generate at medium and a 1024 or 1536 dimension while you iterate, then bump the winners to high or upscale. No reason to pay 4K rates on drafts you'll throw away.
FAQ
Is GPT Image 2 good for ecommerce product photos? Yes, it's one of the best choices specifically because it renders labels and packaging text correctly and handles shadows and reflections in a physically plausible way (fal.ai review). That's the exact failure point of most image models for product work.
What resolutions does it support? Up to 4K. On fal, both edges must be multiples of 16, max edge 3,840px, aspect ratio up to 3:1, and total pixels between 655,360 and 8,294,400 (fal.ai model page).
How much does GPT Image 2 cost? On fal, per image it ranges from about half a cent at low quality up to roughly $0.40 for high-quality 4K (fal.ai model page). Through OpenAI's API it's $8 per 1M input tokens and $30 per 1M output tokens for image output (OpenAI announcement).
Can it edit existing product photos? Yes. It has a dedicated edit endpoint with mask support for precise inpainting and outpainting, so you can change one element while preserving the rest of the scene (fal.ai model page).
GPT Image 2 vs Nano Banana 2, which is better for products? GPT Image 2 for anything with text or labels and where color accuracy matters. Nano Banana 2 if you need speed, cheaper 4K, or the same character across many images (fal.ai review).
What can't it do well? It's slower than competitors, 4K is expensive, the knowledge cutoff is December 2025, and it's not the best at holding a consistent character across many images (fal.ai review).
The bottom line
GPT Image 2 earns its place on the shortlist for one clear reason: it's the model you trust when the words on the product have to be right and the shadows have to look real. It costs more and runs slower than the alternatives, so don't reach for it when you're spinning up fifty quick social variations. Reach for it when a real customer is going to look closely at your product and you can't afford it to look fake.
Want to see how it handles your actual product? Try GPT Image 2 on Masonry and compare it against other image models on the same prompt before you decide.