Goodbye Apple-a-dog – OpenAI gpt-image-1-mini vs gpt-image-1 – November 2025

Table of Contents

Comparing OpenAI Image Generation Models November 2025: A Comprehensive Quality Analysis between gpt-image-1-mini (new) vs gpt-image-1 (existing)

After my detailed look at gpt-image-1 quality comparison I was keen to do the same analysis of OpenAI’s lates model gpt-image-1-mini.

The Testing Approach

For this expanded analysis, I selected two prompts specifically designed to test different aspects of image generation capabilities:

  1. "A person calms a rearing horse" – This tests the model’s ability to handle human figures, dynamic action, and anatomical accuracy of both humans and animals in interaction.
  2. "A fruit cart tumbles down some stairs. The fruit cart sign reads 'Apple-a-day'" – This challenges the model with complex physics (tumbling objects), multiple elements (fruit, cart, stairs), and text rendering.

Following Edward Tufte’s visualization principles, I’ve arranged the outputs in small multiples to facilitate direct visual comparison. For each prompt, I generated multiple images across different quality settings to evaluate consistency, feature accuracy, and stability.

For previous legacy models dall-e-3 and dall-e-2 see the previous article

Findings: gpt-image-1-mini “A person calms a rearing horse”

horse quality comparison gpt-image-1 vs gpt-image-1-mini – low medium high

Click for an enormous zoomable version: large 6323x2700px raw link

Technical Note

The middle column was made by gpt-image-1-mini with exactly the same prompt as last time "A person calms a rearing horse". Something has changed in the default style of the mini model such that the output was a drawing. Previous the gpt-image-1 default output style had been photographic realism. So to aid the task of comparison, I also generated the rest with a slightly modified prompt "realistic photo - a person calms a rearing horse". This is a mild change which should preserve most of the original characteristics for side-by-side comparison between models. The generations with the modified prompt are marked with an asterisk (*)

gpt-image-1-mini Low Glitches

Compared to the other model there are not very many glitches at this level.

  • Left Hand Side
    • The person’s outstretched hand has mangled fingers

gpt-image-1-mini Medium Glitches

There were no major glitches that I spotted in Medium.

gpt-image-1-mini High Glitches

Looking at the High Glitches there are still some glitches but more subtle

  • Left Hand Side
    • The person’s fingers are slightly mangled
Comparison with gpt-image-1 model

The ‘person’ is plainer. The person tends not to have a hat. There are fewer glitches detected.

Findings: gpt-image-1-mini “A fruit cart tumbles down some stairs. The fruit cart sign reads ‘Apple-a-day'”

Applecart Quality Comparison gpt-image-1

Click for an enormous zoomable version: large 4375x2700px raw link

This more complex prompt highlighted several capability edges:

  • Text rendering: The “Apple-a-day” text was correctly rendered properly in all gpt-image-1-mini outputs 🙂
  • Fruit fidelity:
    • High: Good, variety
    • Medium: Okay, one had less variety of fruit
    • Low: Draft quality
Comparison with previous gpt-image-1 model

Text rendering: The “Apple-a-day” text was correctly rendered properly in all gpt-image-1-mini outputs 🙂 . Previously gpt-image-1 low output Apple-a-dog and Apple e day

Low Quality text adherence improved from gpt-image-1 to gpt-image-1-mini

Quality Settings Impact Results

The API’s quality settings have less differentiation now. Low quality is a much better baseline, especially with the improved text adherence.

Higher quality settings still produced better results. Check on the very detailed raw images to make your own mind up.

Practical Implications

For practical applications, these findings suggest:

  1. The baseline is improved. Low is no longer ‘draft’, and may be good enough to use.
  2. Choose quality settings to taste: Becase “low” may be good-enough, carefully evaluate if your use case requires more.

What’s Next?

Would you be interested in seeing a regular sample of image generations using precisely the same prompts to monitor generation stability over time? This could provide valuable insights into how the model evolves with updates and fine-tuning.

I’m also considering expanding this analysis to include more specialized prompts targeting specific capabilities like architectural rendering, facial expressions, or complex lighting scenarios.