Table of Contents
- Overview
- “A fruit cart tumbles down some stairs. The fruit cart sign reads ‘Apple-a-day'”
- Conclusions and Practical Implications
- What’s Next?
May 2025 Image Generation OpenAI vs Gemini ImageGen2 vs Gemini ImageGen3 vs Grok
After my initial proof-of-concept comparison and follow up 4o quality comparison gained traction on AI Twitter, I decided to conduct a more thorough survey of across the providers.
The Testing Approach
For this expanded analysis, I started with a prompt specifically designed to test different aspects of image generation capabilities:
"A fruit cart tumbles down some stairs. The fruit cart sign reads 'Apple-a-day'"
– This challenges the model with complex physics (tumbling objects), multiple elements (fruit, cart, stairs), and text rendering.
Again I follow Edward Tufte’s visualization principles, I’ve arranged the outputs in small multiples to facilitate direct visual comparison. For each prompt, I generated multiple images across different quality settings to evaluate consistency, feature accuracy, and stability.
Findings: “A fruit cart tumbles down some stairs. The fruit cart sign reads ‘Apple-a-day'”

Click for an enormous zoomable version: large 4800x2700px raw link
The top row is direct output with no further prompting. The middle and bottom rows I prompted the models so that the art style was closer to the direct output from OpenAI 4o. This closer matched style aids comparison on other aspects. You will see Google Gemini and Grok choose photographic images when creating direct output.
Conclusions and Practical Implications
I recommend you click for the large zoomable image and take a look for yourself.
- OpenAI 4o
- 🏆OpenAI 4o is the class leader🏆
- The results are very consistent between generations
- They results are also very consistent between quality settings. This allow ‘draft’ quality image generation for less money before a ‘final print’ generation.
- The API access is straight forward
- The images are calmer, focusing on just the subject matter and well composed
- This model knows less is more sometimes
- The generated written text is superb
- It only puts the text required here and doesn’t add extra texty-bits
- Google Gemini
- Extra gibberish text is generated
- Google Gemini (cont)
- The images are busy. Google thinks more is more, but often for humans less is more.
- It’s really confusing how to actually get the latest model.
- I tried the SDK Python API but got confusing errors
- Some errors were very unhelpful
- Some errors implied the user had to ask for special permissions. I gave up at this point
- The online instructions have bugs and lack end-to-end cohesion. You are on your own and there is little help
- Some errors were very unhelpful
- In the end I used labs.ImageFX, but the exact model is hidden away in fine print
- I tried the SDK Python API but got confusing errors
- Grok
- Default output is horizontal landscape. This is just a difference, it is neither better nor worse than square
- The images are busy, like Google Gemini. To Grok, more is more, but often for humans less is more
- The text is more restrained than Google Gemini, with no obvious invented gibberish.
- The text adherence is worse than OpenAI, often with mistakes on the key portions
What’s Next?
Let me know by twitter DM which image generation provider you’re planning to use!