Can a Language Model Paint?

liamlaverty•2 days ago

I've been trying to get some language models to paint one stroke at a time for a few months now. I thought this community would be interested to see the results.

The article runs through my findings, and there's a linked technical rundown of how the app was built. There's also an interactive gallery [0] of my attempts. You can point an agent at the API docs [1], and they might (ymmv) do a painting themselves.

[0] https://www.liamlaverty.com/paint-by-language-model/ [1] https://www.liamlaverty.com/paint-by-language-model/draw/api

mountainriver•about 2 hours ago

Very cool! I’ve been trying this quite a bit too

mock-possum•about 1 hour ago

I do like this one https://www.liamlaverty.com/paint-by-language-model/inspect/...

It’s a bit disappointing that it wasn’t literally painted, just digitally simulated.

jamilton•about 2 hours ago

Neat. I wonder if a allowing the models to inspect pixels or pixel regions, instead of fully relying on the VLM, would help at all. The spatial reasoning required might be too complex though. In general the VLM seems to be a limiting factor, so I wonder if there's some way to usefully augment it or sidestep limitations.

Like, instead of being in pseudo-MSpaint, pseudo-Photoshop with manipulable layers and bounding boxes. They struggle to add an outline to something previously drawn, but that's something that could be done programmatically. The limitations are obviously part of what makes this interesting, but different limitations could be interesting, too. Maybe additional complexity would just result in more uninteresting failures though, I don't know.

I noticed that the feedback/strengths/suggestions outputs are clearly also given the initial image's prompt. It could be useful to additionally have an output that's not given the prompt, so the LLM knows what the VLM sees without bias?

gus_massa•1 day ago

Can a Language Model Paint?

⚡ Community Insights

Discussion (7 Comments)Read Original on HackerNews