ChatGPT Images 2.0

133

mmeetpateltech about 5 hours ago 30 commentsRead Article on openai.com

FR version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

79% Positive

Analyzed from 1885 words in the discussion.

Discussion (30 Comments)Read Original on HackerNews

vunderba•about 3 hours ago

OpenAI’s gpt-image-1.5 and Google’s NB2 have been pretty much neck and neck on my comparison site which focuses heavily on prompt adherence, with both hovering around a 70% success rate on the prompts for generative and editing capabilities. With the caveat being that Gemini has always had the edge in terms of visual fidelity.

That being said, gpt-image-1.5 was a big leap in visual quality for OpenAI and eliminated most of the classic issues of its predecessor, including things like the “piss filter.”

I’ll update this comment once I’ve finished running gpt-image-2 through both the generative and editing comparison charts on GenAI Showdown.

Since the advent of NB, I’ve had to ratchet up the difficulty of the prompts especially in the text-to-image section. The best models now score around 70%, successfully completing 11 out of 15 prompts.

For reference, here’s a comparison of ByteDance, Google, and OpenAI on editing performance:

https://genai-showdown.specr.net/image-editing?models=nbp3,s...

And here’s the same comparison for generative performance:

https://genai-showdown.specr.net/?models=s4,nbp3,g15

UPDATES:

gpt-image-2 has already managed to overcome one of the so‑called “model killers” on the test suite: the nine-pointed star.

Results are in for the generative (text to image) capabilities: Gpt-image-2 scored 12 out of 15 on the text-to-image benchmark, edging out the previous best models by a single point. It still fails on the following prompts:

- A photo of a brightly colored coral snake but with the bands of color red, blue, green, purple, and yellow repeated in that exact order.

- A twenty-sided die (D20) with the first twenty prime numbers (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71) on the faces.

- A flat earth-like planet which resembles a flat disc is overpopulated with people. The people are densely packed together such that they are spilling over the edges of the planet. Cheap "coastal" real estate property available.

All Models:

https://genai-showdown.specr.net

Just Gpt-Image-1.5, Gpt-Image-2, Nano-Banana 2, and Seedream 4.0

https://genai-showdown.specr.net?models=s4,nbp3,g15,g2

ea016•about 4 hours ago

Price comparison:

GPT Image 2

  Low     : 1024×1024 $0.006 | 1024×1536 $0.005 | 1536×1024 $0.005

  Medium  : 1024×1024 $0.053 | 1024×1536 $0.041 | 1536×1024 $0.041

  High    : 1024×1024 $0.211 | 1024×1536 $0.165 | 1536×1024 $0.165

GPT Image 1

  Low     : 1024×1024 $0.011 | 1024×1536 $0.016 | 1536×1024 $0.016

  Medium  : 1024×1024 $0.042 | 1024×1536 $0.063 | 1536×1024 $0.063

  High    : 1024×1024 $0.167 | 1024×1536 $0.25  | 1536×1024 $0.25

kibibu•about 2 hours ago

Genuine question: what positive use cases are sufficient to accept the harm from image generators?

One that i can think of:

- replacing photography of people who may be unable to consent or for whom it may be traumatic to revisit photographs and suitable models may not be available, e.g. dementia patients, babies, examples of medical conditions.

Most other vaguely positive use cases boil down to "look what image generators can do", with very little "here's how image generators are necessary for society.

On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.

bulletsvshumans•15 minutes ago

Democratizing visual communication is arguably useful, for instance helping people to create diagrams that illustrate a concept they wish to convey. This is contingent on the tech working sufficiently well that the visuals are more effective at communication than the text that went into producing them though.

chromacity•about 1 hour ago

How else do you expect me to illustrate my LLM-generated blog posts about AI?

2ndorderthought•33 minutes ago

Oh my. You still make those? Ever since model chupacobra 2.46 we have AI agents making those for us. At one point I was on the fence about totally outsourcing it to agents but it's way more efficient. Now I have 50 posts a day under different names.

atleastoptimal•31 minutes ago

The problem is I'd prefer access to near-photorealistic image gen to be commodified vs something that is restricted, as then only those willing to skirt the law or can leverage criminal networks have access to it.

spijdar•about 1 hour ago

The same question could be poised of art in general. I know that response would (and probably should) ruffle peoples' figurative feathers, but I think it's worth considering. A lot of art isn't "necessary for society".

The question still stands, "are the benefits worth the cost to society", but it bears remembering we do a lot of things for fun which aren't "necessary for society".

TomGarden•about 1 hour ago

I used to think like what you describe, but I've fallen on the side of "art is just more emotionally resonant human communication". And most of the time human communication with more effort and thought behind it. AI art falls short on both being human and, on average, having more effort or thought behind it than your general interaction at the supermarket.

I will say, it can be emotionally resonant though - but it's a borrowed property from the perception of human communication and effort that made the art the models were trained on.

tills13•about 1 hour ago

The difference between "art in general" and this is scale and speed. Sure, I'll grant you that people are going to engage in deception with or without this but the barrier to entry with this is literally on the floor. Do you have a $5 prepaid VISA? You can generate whatever narrative you want in 30 seconds. Replace the $5 Prepaid VISA with the pocketbook of a three letter agency and it starts getting crazy.

Barbing•43 minutes ago

>starts getting crazy

Got pretty wild w/the Iranian propaganda that reportedly _resonated with Americans_ (didn't verify that claim)

Slopaganda - https://www.newyorker.com/culture/infinite-scroll/the-team-b...

Jtarii•18 minutes ago

If you want to say the complete destruction of truth is worth it because some people are having "fun" then idk.

SpicyLemonZest•9 minutes ago

I was worried about the complete destruction of truth, but it seems that's not the result of commoditized image generation. False AI-generated images have been widespread for years, and as far as I've seen, society has adapted very well to the understanding that images can't prove anything without detailed provenance. I'd argue that this has been helped, actually, by random people on the Internet routinely generating plausible images of events that obviously didn't happen.

nothinkjustai•about 1 hour ago

Art is for the producer, and if they feel it’s necessary for them to produce it than it’s necessary for them, and what is necessary for the individual extends to the society they’re in.

NathanielK•9 minutes ago

Ok, but the models only know what to draw because we fed them images of dementia patients and babies.

Maybe image generators can be a loophole for consent legally, but it seems even grosser morally.

ticulatedspline•29 minutes ago

Is the argument any different replacing the word "image generators" with "photoshop" ?

JumpCrisscross•12 minutes ago

> Genuine question: what positive use cases are sufficient to accept the harm from image generators?

Diagrams and maps. So much text-based communication begs for a diagram or a map.

_pdp_•27 minutes ago

There are many use-cases outside of spam and slop.

For example, take a picture of your garden. Ask chatgpt to give you ideas how to improve it and a step by visual guide.

Anything that can be expressed visually is effectively target for this technology - this covers pretty much everything.

LZ_Khan•29 minutes ago

Saving money for businesses trying to promote their products?

tantalor•40 minutes ago

Prototyping. Suppose you have a hard time expressing your vision in words or executing it visually.

1. Generate 100s or 1000s of low-fidelity candidates, find something that matches your vision, iterate.

2. Hand that generated image off to a human and say, "This is what I'm thinking of, now how do we make it real?"

Important: do not skip the last step.

ndriscoll•about 1 hour ago

Not much beyond food, water, and shelter is "necessary" for society, but it's nice to have nice things.

I'm teaching my 4 year old to read. She likes PAW Patrol, but we've kind of exhausted the simple readers, and she likes novelty. So yesterday I had an LLM create a simple reader at her level with her favorite characters, and then turned each text block into a coloring page for her. We printed it off, she and her younger sister colored it, and we stapled it into her own book.

I could come up with 10 3 word sentences myself of course, but I'm not really able to draw well enough to make a coloring book out of it (in fact she's nearly as good as me), and it also helps me think about a grander idea to turn this into something a little more powerful that can track progress (e.g. which phonemes or sight words are mastered and which to introduce/focus on) and automatically generate things in a more principled way, add my kids into the stories with illustrations that look like them, etc.

Models will obviously become the foundation of personalized education in the future, and in that context, of course pictures (and video) will be necessary!

drivebyhooting•about 1 hour ago

Repetition rather than novelty is good for learning.

ndriscoll•41 minutes ago

Sure, and she gets that, but at some point she completely memorizes the stories. She also asks if we can get new books at the store, but they don't make 'em that fast.

mcmcmc•about 1 hour ago

So the use case is just IP theft so you can get more Paw Patrol?

AI aside, if you’ve truly exhausted all the simple readers, maybe she should move on to more advanced books instead of repeating more of the same and gamifying it, which seems a great way to destroy a child’s natural curiosity.

ndriscoll•24 minutes ago

Sure, I don't view "IP" as valid, don't consider it theft, and absolutely don't care. In fact I'd go so far as to say that holding the position that there's something wrong with tailoring teaching to a child's interests and avoiding that for fear of copyright concerns of all things actually makes you morally bad.

You overestimate how many there are. There's like 10 stories at that level. I do also read ones with paragraphs to her, but she can't do those herself because she's 4.

lanthissa•27 minutes ago

people pay them to use it, they find that positive

infecto•about 1 hour ago

Could the same argument not be applied to practically everything and have drastically different perspectives from people?

stackedinserter•44 minutes ago

I have plenty for you:

- package design

- pictures for manuals and guides

- navigation and signs

- booklets, tickets and flyers

- logos of all sorts

- websites

- illustrations for books

And many. many others. Not every image is art and very few illustrators are artists.

Jtarii•9 minutes ago

So the benefits are that something that was already being mass produced with no issue is slightly easier to mass produce?

It's not a particularly compelling argument.

pesus•33 minutes ago

How do these justify the costs to society?

Legend2440•25 minutes ago

The 'costs to society' are massively overblown, and some of them (automating jobs) are actually benefits to society.

throwaway2027•about 4 hours ago

I know people like to dunk on ChatGPT and Gemini and say Claude is or used to be better, but you can still use worse models when you're out of usage AND make use of Nano Banana and and ChatGPT Image generation with separate limits for your subscription. I think it could make it a more package as a whole for some people (non-programmers). I do like having the option and am excited for which improvements they've done to ChatGPT Image generation because in the past it had this yellow piss filter and 1.5 it sort of fixed it but made things really generic with Nano Banana beating it (altough Gemini also had a too aggressively tuned racial bias which they fixed), it seems the images ChatGPT generates have gotten better.

joegibbs•about 2 hours ago

The quality of the text is really impressive and I can’t seem to see any artefacts at all. The fake desktop is particularly good: Nano Banana would definitely slip up with at least a few bits of the background.

6thbit•about 4 hours ago

System card link with safety details https://deploymentsafety.openai.com/chatgpt-images-2-0

direct pdf https://deploymentsafety.openai.com/chatgpt-images-2-0/chatg...

samiwami•about 4 hours ago

do they have anything similar to SynthID, or are they just pretending that problem doesn't exist?

I know this is probably mega cherry-picked to look more impressive, but some of the images are terrifyingly realistic. They seem to have put a lot of effort into the lighting.

Legend2440•about 4 hours ago

I think we are just going to have to accept that realistic images can be easily fabricated now.

Seeing is not believing anymore, and I don't think SynthID or anything like it can restore that trust in images.

louiereederson•about 4 hours ago

The image of the messy desktop with the ASCII art is so impressive - the text renders, the date is consistent, it actually generated ASCII art in "ChatGPT", etc. I was skeptical that it was cherry-picked but was able to generate something very similar and then edit particular parts on the desktop (i.e. fixing content in the browser window and making the ASCII dog "more dog like"). It's honestly astounding, to me at least.

Melatonic•about 3 hours ago

We were afraid it would be Skynet and instead we got the ultimate meme generator !

throw310822•about 3 hours ago

Ok, I can hear the sound of entire industries crumbling right now.

thevinter•about 4 hours ago

Every time a new image gen comes out I keep saying that it won't get better just to be surprised again and again. Some of the examples are incredible (and incredibly scary. I feel like this is truly the point where understanding if something is AI becomes impossible)

minimaxir•about 4 hours ago

Model card for the API endpoint gpt-image-2 (which may or may not reflect the output from ChatGPT Images 2): https://developers.openai.com/api/docs/models/gpt-image-2

API Pricing is mostly unchanged from gpt-image-1.5, the output price is slightly lower: https://developers.openai.com/api/docs/pricing

...buuuuuuuuut the price per image has changed. For a high quality image generation the 1024x1024 price has increased? That doesn't make sense that a 1024x1024 is cheaper than a 1024x1536, so assuming a typo: https://developers.openai.com/api/docs/guides/image-generati...

The submitted page is annoyingly uninformative, but from the livestream it proports the same exact features as Gemini's Nano Banana Pro. I'll run it through my tests once I figure out how to access it.

ieie3366•about 3 hours ago

It's great. Also doesn't seem to have any "slop" standard look, the images it produces are quite diverse.

I would imagine this will hit illustrators / graphics designers / similar people very hard, now that anyone can just generate professional looking graphical content for pennies on the dollar.

retrac98•about 3 hours ago

The page keeps crashing on my iPhone 17 Pro.

Bennettheyn•about 3 hours ago

fal has the endpoint under openai/gpt-image-2

ChrisArchitect•about 3 hours ago

Fake layouts, fake handwritten kid story, fake drunk photos? All from training on real things people did.

As with anything AI, we are not ready for the scale of impact. And for what? Like, why are you proud of this?