OpenAI just dropped a monster upgrade to ChatGPT’s image generation, and it’s one of those moments where you blink, look again, and start questioning reality.
I won’t waste your time with numbers, model sizes, or how many bazillion GPU hours the new model chews through. I’m just gonna show you what this thing can do—and how it stacks up against the older DALL-E model.
7
Hands and Fingers
A close-up of someone playing an E minor chord on a guitar, fingers pressing down on the strings with shallow depth of field.
AI image generation blew our minds when it first went mainstream. And then… we looked closer. The hallmark sign of an AI image is the weird hand and finger anatomy. So, what better way to test the models than to ask them to depict a guitar chord?
To save the best for the last, I asked the original DALL-E model first, and then the new image generator integrated into the ChatGPT 4o model.
Above is what DALL-E came up with. Despite DALL-E’s shortcomings, it actually handled the fingers and general anatomy decently here. But the chord itself … not so much. The hand’s positioned way too high on the fretboard to be playing E minor. If you zoom in a bit, you’ll catch that the guitar has more than seven strings. The spacing between the strings is also all over the place.
With that in mind, let’s move on to ChatGPT 4o.
I could’ve told you I’m joking and that this is actually an old photo from back when I played guitar. ChatGPT 4o is that good. Six strings, evenly spaced, and the chord is actually E minor. I’m impressed.
6
Historical Figures
Albert Einstein eating an ice cream in Central Park, wearing a casual shirt and suspenders.
Now that we’ve gotten our hands (and fingers) dirty, let’s mess with some faces. I figured we’d try historical figures since they won’t get offended, and it would be fun to see them in a modern setting.
A total letdown. To be fair, DALL-E did warn me it couldn’t use Einstein himself and would go with someone “closely resembling” him instead. One of DALL-E’s classic tells is its cartoonish-yet-realistic style, which shows up in full force here.
The San Remo in the background does hint that this is Central Park, but that’s about the only win here. Moving on to ChatGPT 4o.
Slap a black-and-white filter on it, and I could’ve convinced you it’s a real vintage photo. The cream on the cone looks properly creamy, Albert’s rocking his signature nonchalant vibe, and the San Remo is still back there, standing tall. Everything checks out. ChatGPT 4o nailed it.
5
Fictional Figures
A figure similar to a Sith Lord calling for a taxi in George Square, Glasgow, with light rain and traffic lights in the background.
By now we’ve seen that ChatGPT can paint historical figures pretty well. Since faces and people are still one of the best ways to stress-test an AI, let’s try some more.
I went with “similar” to get the bot to cooperate without hitting me with the copyright speech. DALL-E’s result is okay. The figure does remind you of a Sith, and the rest of the elements are more or less accurate.
There’s nothing explicitly cartoonish about it, but it just doesn’t feel real. Want real? Check out what ChatGPT 4o produced with the same prompt:
I love the atmosphere—the lighting, the drizzle, the brooding Sith lord presence. It’s all there. The only problem is that our dark lord is standing in the street calling a taxi while facing… the sidewalk. Oh, and the taxi sign says “TAXL.”
Let’s pivot from future fiction to historic fiction. Something like:
A character similar to Geralt of Rivia shopping for groceries in a modern supermarket, pushing a cart and frowning at canned goods.
Not bad at all. The image still carries that synthetic cartoony vibe and the text on the cereal boxes is total gibberish, as expected.
ChatGPT 4o initially refused the prompt because of copyright—but it worked once I swapped “similar to” with “resembling.” Behold:
I’m speechless. Like most people, ChatGPT’s interpretation of Geralt is basically just Henry Cavill, not the video game version—but, it nailed it. The scowl is on point, and the setting feels natural.
This could pass as a shot from the set of a weird crossover ad. And yes, I read The Witcher books before the show was a thing.
4
Cartoons
A cartoon-style pirate captain with a long red coat and a cybernetic arm, laughing on the deck of a flying ship. Transparent background.
OpenAI’s image generation isn’t limited to realism. While DALL-E always leans a bit airbrushed no matter what you throw at it, I decided to push both models into full cartoon mode.
DALL·E actually did a solid job here—and it even understood the request for a transparent background. Sort of. What we got was the classic gray-and-white checkerboard pattern that usually means transparent… except here, it’s baked into the image. So, not transparent at all.
Also, ironically, our AI pirate’s biological hand has four fingers while the cybernetic one has five. Maybe he chromed the wrong arm?
ChatGPT 4o’s version feels sharper and more deliberate. The coloring style is different—whether it’s better or not is subjective—but it clearly looks like an artist meant to draw it that way.
The background is also actually transparent. You could slap this on a T-shirt, print it out, or even turn it into a WhatsApp sticker on the spot.
3
Mirrors and Reflections
A modern bathroom sink with a toothbrush and razor on the counter, both visible in the mirror and real-world view—lighting is soft and even.
Mirrors reflect—and reflections need spatial logic to look natural. I threw out a prompt I knew DALL-E would fumble.
As expected. Something is trying to be a reflection from the faucet in the mirror, but it’s way too long. The toothbrush is levitating, inside the sink, and casting no reflection. DALL-E really strapped on its AI helmet for this one.
The newer model does a much better job of making the image feel real, like an actual photograph. The faucet’s reflection is a little skewed but passable. Then there’s the toothbrush, which has a reflection but doesn’t exist in the physical world—like a reverse vampire.
No clear winner here. AI results are inconsistent, so I gave both another shot with something a little more ambitious:
A woman standing in front of a full-length mirror in a sunlit bedroom, her outfit and pose mirrored exactly, with visible reflection of the window behind her.
… I don’t even want to dignify this one with an analysis. Folks, If you want to make DALL-E look bad, just toss the word “mirror” into your prompt. Moving on.
As expected, ChatGPT 4o’s version looks a lot more realistic—but maybe a bit surreal this time? The woman’s pose and outfit are mirrored, but only partially, like a Photoshop 3D pop-out effect. The reflection angles are also off. AI still can’t handle spatial logic.
2
Cars and Streets
A 2006 Ford GT and a Peugeot 206 behind a red traffic light on Wall Street, New York, midday.
I’m a car enthusiast. When AI image generators first hit the scene, one of the first things I tried was making photos of cars. The results back then weren’t good, but with the new model out, I had to give it another shot.
There goes DALL-E again with its increasingly annoying cartoon aesthetic. The Peugeot is on the sidewalk, the traffic lights I asked for are facing the buildings, and the plate numbers are all gibberish.
ChatGPT 4o’s results are significantly better. The cars are properly depicted—even the Peugeot’s wheel cover is spot-on and era-correct. That kind of detail isn’t accidental. But it gets even better:
I could actually use this one as my phone wallpaper. The lighting, the composition, the reflections—it all checks out. Other than the weird emptiness of the street, this could straight-up pass for a real photo.
1
Text and Letters
A handwritten letter on aged paper with cursive script, resting next to a fountain pen and an ink bottle.
Finally, we aim at the Achilles’ heel of every image generator. Most image generator AIs struggle to get text right. By now, you’ve seen enough gibberish from DALL-E in the earlier examples to know what I mean.
To make it more interesting—and consistent—I added that the letter should contain the text of King Terenas’ speech to Arthas from Warcraft III.
DALL-E did what it does best with text: turned it into smudgy, unintelligible gibberish. It managed to get some words right, and the atmosphere works—the pen and ink bottle look solid.
ChatGPT 4o nails it—every single word, in clean cursive script. Letter-perfect. Compared to DALL-E, this is a massive leap forward. Hats off, OpenAI.

Related
These 6 AI Photo Editors Are Better Than Photoshop: Here’s Why
Ever since the AI boom, you no longer need expertise or big bucks for Photoshop. Check out these AI alternatives instead.
AI image generation has come a long way—and it shows. ChatGPT 4o feels like the first model that genuinely gets it when it comes to lighting, texture, and context.
At this point, the only real question left is: how strong are ChatGPT’s safeguards? I easily got past its copyright restrictions. How long before someone jailbreaks ChatGPT and starts generating whatever content they want using this absurdly capable model?