Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
81% Positive
Analyzed from 1345 words in the discussion.
Trending Topics
#caveman#prompt#brief#per#run#same#claude#tokens#compression#model
Discussion Sentiment
Analyzed from 1345 words in the discussion.
Trending Topics
Discussion (38 Comments)Read Original on HackerNews
Though I feel like industry veterans (especially those working with LLMs) came to this conclusion without having to write a single prompt. Even ignoring the technical merits of these kinds of hacks, if you think you've outwitted billions of dollars of statistics with a prompt, you're probably wrong at this point.
What I find most interesting is the popularity of these snake oils, especially the ones that are easy to install and never check. The tech moves so fast and the research is so scarce and poor-quality that the bullshit asymmetry principle wins and people buy into these cargo cults.
Maybe we need a plugin to check if a new plugin/prompting technique/LLM lifehack is BS.
My understanding is that there was only 1 run per configuration?
If that is correct, because of the run-to-run variability, it really doesn't say much. It will take several trails per prompt per arm before it will look like it is stabilizing on a plot. It is prohibitively expensive so I've been running same prompt, same model 5 times in order to get a visual understanding of performance.
Someone did the same with lambda calculus yesterday. I wanted to make the point about how much run-to-run variability and difference in cost with the same prompt with the same model running only 5 trials. I classified each of the thinking steps using Opus 4.6 (costs ~$4 in tokens per run just for that) and plotted them with custom flame graphs. [0]
When the run-to-run variability is between 8,163 and 17,334 tokens none of these tests mean that much.
[0] https://adamsohn.com/lambda-variance/
Slightly off-topic: it's quite apparent that you've used Claude as an editor for the blog post. Every sentence has been sanded smooth — the rough edges filed off, the voice flattened, the rhythm set to metronome. It doesn't read like writing anymore. It reads like content. Neat little triplets. Tidy paragraphs. A structure so polished it could pass a rubric, but couldn't hold a conversation. /s
In my opinion that is unnecessary and detracts from a great, simple piece. I miss human writing.
Right, and that final response forms the latest context for your next follow-up prompt. Not having that final reasoning laid out in the conversation history leaves a huge gap in successive reasoning. I remember playing around with this idea in the Sonnet 3.x days and it was immediately obvious how the ability to handle long running tasks degraded. If you are just doing single-shot work for some reason, sure, but that's not what most real world usage looks like these days.
It is the same idiocy that permeates EV cars. You buy an expensive car to go from A to B and at the same time offer you comfort. When I have to think about using the seat heating or not, I'm out of my comfort zone. So no, fuck caveman, and I don't fucking care about the burned tokens.
Be brief. It's easy, no setup needed, not another mindless mumbojumbo extension and its 325 dependencies.
Like you push the seat heating button if your seat feels cold. What is there to think about?
Then why are you using AI?
Not a big difference between an articulate idiot and a succinct one.
It would have been hilarious if the author spoke like a caveman in his video or had a section in that article where he explained his conclusions like a caveman.