May 2025 Image Generation OpenAI vs Gemini ImageGen2 vs Gemini ImageGen3 vs Grok

Overview
“A fruit cart tumbles down some stairs. The fruit cart sign reads ‘Apple-a-day'”
Conclusions and Practical Implications
What’s Next?

May 2025 Image Generation OpenAI vs Gemini ImageGen2 vs Gemini ImageGen3 vs Grok

After my initial proof-of-concept comparison and follow up 4o quality comparison gained traction on AI Twitter, I decided to conduct a more thorough survey of across the providers.

The Testing Approach

For this expanded analysis, I started with a prompt specifically designed to test different aspects of image generation capabilities:

"A fruit cart tumbles down some stairs. The fruit cart sign reads 'Apple-a-day'" – This challenges the model with complex physics (tumbling objects), multiple elements (fruit, cart, stairs), and text rendering.

Again I follow Edward Tufte’s visualization principles, I’ve arranged the outputs in small multiples to facilitate direct visual comparison. For each prompt, I generated multiple images across different quality settings to evaluate consistency, feature accuracy, and stability.

Findings: “A fruit cart tumbles down some stairs. The fruit cart sign reads ‘Apple-a-day'”

Apple Cart OpenAI 4o vs Google Gemini ImageGen 2/3 vs Grok. Side by Side — Apple Cart OpenAI 4o vs Google Gemini ImageGen 2/3 vs Grok

Click for an enormous zoomable version: large 4800x2700px raw link

The top row is direct output with no further prompting. The middle and bottom rows I prompted the models so that the art style was closer to the direct output from OpenAI 4o. This closer matched style aids comparison on other aspects. You will see Google Gemini and Grok choose photographic images when creating direct output.

Conclusions and Practical Implications

I recommend you click for the large zoomable image and take a look for yourself.

OpenAI 4o
- 🏆OpenAI 4o is the class leader🏆
- The results are very consistent between generations
- They results are also very consistent between quality settings. This allow ‘draft’ quality image generation for less money before a ‘final print’ generation.
- The API access is straight forward
- The images are calmer, focusing on just the subject matter and well composed
  - This model knows less is more sometimes
- The generated written text is superb
  - It only puts the text required here and doesn’t add extra texty-bits
Google Gemini
- Extra gibberish text is generated

Google Gemini put in extra gibberish text

Google Gemini (cont)
- The images are busy. Google thinks more is more, but often for humans less is more.
- It’s really confusing how to actually get the latest model.
  - I tried the SDK Python API but got confusing errors
    - Some errors were very unhelpful
      - Some errors implied the user had to ask for special permissions. I gave up at this point
    - The online instructions have bugs and lack end-to-end cohesion. You are on your own and there is little help
  - In the end I used labs.ImageFX, but the exact model is hidden away in fine print

Google ImageFX hides away the model in use

Grok
- Default output is horizontal landscape. This is just a difference, it is neither better nor worse than square
- The images are busy, like Google Gemini. To Grok, more is more, but often for humans less is more
- The text is more restrained than Google Gemini, with no obvious invented gibberish.
- The text adherence is worse than OpenAI, often with mistakes on the key portions

What’s Next?

Let me know by twitter DM which image generation provider you’re planning to use!

May 2025 Image Generation OpenAI vs Gemini ImageGen2 vs Gemini ImageGen3 vs Grok

Table of Contents

May 2025 Image Generation OpenAI vs Gemini ImageGen2 vs Gemini ImageGen3 vs Grok

The Testing Approach

Findings: “A fruit cart tumbles down some stairs. The fruit cart sign reads ‘Apple-a-day'”

Conclusions and Practical Implications

What’s Next?

By Terry Lurie