Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

100% Positive

Analyzed from 298 words in the discussion.

Trending Topics

#https#com#language#paint#months#www#liamlaverty#model#vlm#limitations

Discussion (7 Comments)Read Original on HackerNews

liamlaverty•2 days ago
I've been trying to get some language models to paint one stroke at a time for a few months now. I thought this community would be interested to see the results.

The article runs through my findings, and there's a linked technical rundown of how the app was built. There's also an interactive gallery [0] of my attempts. You can point an agent at the API docs [1], and they might (ymmv) do a painting themselves.

[0] https://www.liamlaverty.com/paint-by-language-model/ [1] https://www.liamlaverty.com/paint-by-language-model/draw/api

mountainriver•about 2 hours ago
Very cool! I’ve been trying this quite a bit too
mock-possum•43 minutes ago
I do like this one https://www.liamlaverty.com/paint-by-language-model/inspect/...

It’s a bit disappointing that it wasn’t literally painted, just digitally simulated.

jamilton•about 2 hours ago
Neat. I wonder if a allowing the models to inspect pixels or pixel regions, instead of fully relying on the VLM, would help at all. The spatial reasoning required might be too complex though. In general the VLM seems to be a limiting factor, so I wonder if there's some way to usefully augment it or sidestep limitations.

Like, instead of being in pseudo-MSpaint, pseudo-Photoshop with manipulable layers and bounding boxes. They struggle to add an outline to something previously drawn, but that's something that could be done programmatically. The limitations are obviously part of what makes this interesting, but different limitations could be interesting, too. Maybe additional complexity would just result in more uninteresting failures though, I don't know.

I noticed that the feedback/strengths/suggestions outputs are clearly also given the initial image's prompt. It could be useful to additionally have an output that's not given the prompt, so the LLM knows what the VLM sees without bias?

gus_massa•1 day ago
You may enjoy

* "The last six months in LLMs, illustrated by pelicans on bicycles" https://simonwillison.net/2025/Jun/6/six-months-in-llms/ (https://news.ycombinator.com/item?id=44215352 | 962 points | 11 months ago | 239 comments)

* "Using “underdrawings” for accurate text and numbers" https://samcollins.blog/underdrawings/ (https://news.ycombinator.com/item?id=47977990 | 379 points | 9 days ago | 138 comments)

bizer•2 days ago
Good attempt. Compared to diffusion, these paintings look more like they were created by humans.
baCist•2 days ago
LLMs can draw (play music, write books), but they imitate, not create.