DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
64% Positive
Analyzed from 2624 words in the discussion.
Trending Topics
#llm#different#ask#same#don#generated#books#features#patterns#content

Discussion (51 Comments)Read Original on HackerNews
If you ask humans to write 1,000 books, you're asking 1,000 different humans with different experiences and different skills and different moods (etc.) to write those books.
But if you ask LLMs to write 1,000 books, you're probably only talking to 3 or 5 different models, tops. And they've all trained on the same or similar data, and are trained to respond in very similar ways.
The LLMs don't differ much in anything like "life experience" or "skills", and they don't really have anything like a "mood" independent of the prompts you've given them.
I mostly agree, but this is a very simplified explanation. The models are indeed trained to respond in similar ways, for "basic" prompts. And that's as much a feature as it is a bug. In other words, the bug becomes apparent only if you give 100+ basic prompts. But giving it 100+ basic prompts and expecting originality is a silly endeavour. That's not how you get originality.
The way I'd go about to generate 1000 books, while expecting different outcomes is something along these lines (and nowadays you can ask your favorite LLM to wire up this workflow for you, with decent outcomes):
1. Ask for a list of 20 features that define a book (genre, style, number of characters, tropes, plot, continuity, relationships, etc.)
2. For each feature, ask for a list of 50 examples, ordered from most common to the most unique.
3. Randomly pick 10 features, and for each pick one of the 50 generated items. Ask for the rest of the features to match the theme.
4. Ask for 10 possible book outlines that match the chosen features, randomly pick between 2-8.
5. Create a detailed prompt that includes all the above features, and ask for a synopsis for each chapter, given the above outline chosen.
6. Given {features} and {outline} and {synopsis} write chapter 1.
7. for each chapter in list, given {...} and (optional) previous matching chapter(s), write chapter n+1
(optional 8.) given {...} and 2-3 consecutive chapters, align the ending / beginning of a new chapter for style / features / continuity, etc.
(optional 9.) given {...} and the whole book, list chapters / paragraphs that don't match the given {...} and provide a list of 5 improvements. (randomly choose 1 and ask for an edit).
----
Now, this probably won't give you something like cloud atlas, but they'll at least be different books. That's how I'd do it if I wanted to see how different they can write. Not 1000 "basic" prompts and expecting originality.
This is very naive. I can almost guarantee that some combinations of 20 * 50 features will hit on something that has never been written before in that specific combination. And if that's still not enough, increase the number of features. Add more randomness, add more steering, add random steering in random chapters, change it up, and so on.
A good editor could probably reduce all LLM outputs on a subject down to the same point.
Yet another reason why the future is open weight.
Simply, if you ask an LLM, you're asking always to the same mind, and always for the first time.
People are making cookies with cookie cutter number 5 and other people wonder how come they are all the same.
An aside, I usually take my written blog posts through a pass on Notebooklm to generate a podcast like discussion about it. It used to be a good way to extract some insights I haven't thought of. But after 50 of them, I can predict what the host will "pushback" on and exactly when. Then they magically resolve their differences and agree with whatever the idea was. It's truly impressive when you just consume sporadically. But listen frequently and they converge into one blob.
And something that shows that behavior is a scammers wet dream!
I presume you mean, that what I and others is observing is patterns in mere rhetoric. That this is just unimportant window dressing around the actual problem solving.
Yet, generation of rhetoric seems to be one of the key usecases, and one of the key features that makes this technology seem “intelligent”.
AI is regression to the mean.
Much like Socialism.
Om an acute basis, AI can be just as helpful as that safety net.
As a chronic matter, "it's not excellence--it's mediocrity".
I think it's that today's LLMs have access to poor/generic image generation models and people find it easier to ask ChatGPT or NanoBanana to make a cover instead of fine tuning a small SD model for the purpose.
https://infosec.exchange/@lcamtuf/116785283147249092
In these comments there's a common pattern where some users argue that they do not agree that the submission was LLM written and they often focus on specific details to refute it (e.g em-dashes) and some users see the overall pattern clearly that it's totally obvious. For me it's a kind of smell, it's off putting and it's obvious. The article says to "trust your gut". But it's also something that comes with practice and time, it's not some innate thing. People may have better things to do than expend mental energy noticing patterns in a bunch of social media posts. The more I see it, the more I see it.
The take away I get is that it's okay to notice patterns and it's okay to not notice patterns. Remember that other people may be noticing patterns and associations in things that you might miss. Be charitable.
Far more interesting questions are:
1) If you cant see the patterns of LLM writing, does the idea that the thing you liked was written by LLM worry you?
2) If you can see the patterns clearly is the fact that it's LLM written worry you?
Because in our comments there's many who do not care that LLM's are writing content and theres many who do care. But are these correlated with those who can see the LLMs or who are blind to them?
Good human writing especially on highly technical topics its usually compression of information.
Like you have some experience you want to share with others and you work your brains try to put it into concise story.|
Problem us: AI generated texts are opposite 99% of the time: author usually have bullet point list to feed into machine to add hallucinated word predicted story on top of it.
So signal to noise ratio is much worse.
So reading AI texts is pretty much like listening for stories from humans with mental problems - no one really wants to listen to hallutinations even if somewhere inside there is some useful information.
Horselover Fat had a pretty good take on machine generated content, too.
The irony in the machine generated songs in 1984 was that Winston clearly found meaning in them, feeling like they applied to him, even though he knew they were machine generated: (from memory) "Under the shade of the chestnut tree / I sold you and you sold me / here lie they and here lie we / under the shade of the chestnut tree" - that refers to him and Julia selling each other out, right?
Just like people today - and in George Orwell's day, which was why he made it - find meaning in things which is obviously formulaic manufacured corporate slop, like the endless MCU films.
Finding meaning in slop is not ennobling of the human spirit, and I see no reason to champion it.
Also if the meaning is that I sold you and you sold me; what is the upside here?
One question / quibble:
> if a hundred “authors” give their favorite AI tool a similar prompt
Do we really believe there are 100 different people generating those? When I saw the books, I assumed they were generated on demand to match the (to me unlikely) search terms.
I don’t think I’m invested enough to research this. Amazon slop is harder and harder to wade through. (Searches are very imprecise. Deliberate, I’m sure.)
Everything is slop if you make enough of it and squint hard enough.
The point with AI is if and how to steer it to produce something that is interesting and unique for you, not another bland cookie cutter blockbuster or lame summer song.
I want to err on the side that the author wrote this piece, but that dash is suggestive.
The author literally points to that tell in the article.
In a weird twist, I wonder if you’re an LLM?
I think the article's point is probably sound to some great extent, but I would believe I owned a book with a title like "100,000 Whys" when I was young. With a dinosaur and a rocket on the front. I loved dinosaurs and rockets, they're even still cool today.
[1] https://infosec.exchange/@lcamtuf/116785283147249092
I'm sure someone deeply familiar with childrens publishing would be able to talk authoritatively on the extent of new trends, but this seems to be the infosec community and the evidence offered doesn't seem to actually be evidence of anything. There isn't a baseline. Children's encyclopedias might have been a hard-hitting game of radical creativity and high standards in the past, or it could be an endless tide of derivative swill.
And using AI images seems unrelated. That's something people should just be doing. Ideally with better proofreading, but hey. The article's complaint was about lack of originality.
https://infosec.exchange/@lcamtuf/116785283147249092
This is Amazon #1 bestseller in "Children's Encyclopedias"!