RU version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
73% Positive
Analyzed from 4899 words in the discussion.
Trending Topics
#thinking#reasoning#model#tokens#models#more#https#output#summary#thought

Discussion (141 Comments)Read Original on HackerNews
Interleaved reasoning and function calling makes this even more dangerous. A model can call functions during the hidden reasoning phase. An attacker could then exfiltrate data from you while the reasoning summary hides it from the user.
It also makes it impossible to know if the model is doomplooping during reasoning and burning tokens for no reason, as gemini is want to do, which we know about because its hidden reasoning often leaks out when it doomloops.
When the models are AGI and secure from prompt injection I may stop caring, until then I want to know exactly what the model responds to my prompts. or exactly what the agent is doing on my behalf.
Edit, further reading: Fooling around with encrypted reasoning blobs https://blog.cryptographyengineering.com/2026/05/29/fooling-...
If you mean the function calls might happen server side, there is nothing preventing the server from doing it and hiding it from you as long as you are using an API for inference.
the model retrieves https://somewhere into its context and then gets confused, following instructions embedded there.
it then retrieves https://somewhere?exfiltration=private_data_in_context
it gets worse if the tooling which hidden blocks can invoke can retrieve further secrets.
The basic concept is that for a session active recently, interleaved thinking tokens are already in KV cache, so it's more efficient to keep using them than not! But when resuming an older session where KV cache has been evicted, it's more expensive to restore the thinking tokens, so they're silently dropped from prior turns. It's 2026 and stateful servers are back on the menu!
https://www.anthropic.com/engineering/april-23-postmortem describes this as an intended optimization:
> The design should have been simple: if a session has been idle for more than an hour, we could reduce users’ cost of resuming that session by clearing old thinking sections. Since the request would be a cache miss anyway, we could prune unnecessary messages from the request to reduce the number of uncached tokens sent to the API. We’d then resume sending full reasoning history. To do this we used the clear_thinking_20251015 API header along with keep:1.
> The implementation had a bug. Instead of clearing thinking history once, it cleared it on every turn for the rest of the session... This surfaced as the forgetfulness, repetition, and odd tool choices people reported.
And https://news.ycombinator.com/item?id=47879561 is a thread with a Claude team member's further rationale.
> Eliding parts of the context after idle: old tool results, old messages, thinking. Of these, thinking performed the best, and when we shipped it, that's when we unintentionally introduced the bug in the blog post.
(Also, https://news.ycombinator.com/item?id=47884517 indicates OpenAI drops reasoning tokens "smartly" at its own election, which is likely a similar performance optimization.)
I've experimented with rules to have Claude Code be explicit about recapping its thinking tokens, including tool choices and approaches chosen and rejected, into actual message output, but this is lossy at best. And sometimes dropping reasoning tokens can give a session "fresh eyes" in a good way.
I just really don't like the lack of control, and it's a reminder of how ephemeral the current landscape is. The Claude giveth, and the Claude taketh away.
then it waits for the hour and gets dumbed down
Imagine a conversation with turns X, Y, and Z. When the LLM "reasons" about the next token A it does: P(A | X,Y,Z) and then P(B | X,Y,Z,A), etc. It will eventually produce a result P(D | X,Y,Z,A,B,C). Instead of continuing the context from X,Y,Z,A,B,C it continues it from X,Y,Z so you have P(N | X,Y,Z,D). This is what is meant by dropping the reasoning. This is done to save cache context for the session.
This is a different thing than preserving the K/V state of P(N | X,Y,Z,D).
That would be surprising to me. The reasoning _is_ the model intelligence in a lot of respects, and so dropping those from the context would affect its output pretty significantly.
I assume that instead they just have a lot of guardrails in place and multiple runtime environments that an individual turns ping-pong between in order to dehydrate/rehydrate the reasoning to keep it hidden from the end user.
"Stripping extended thinking: Extended thinking blocks (shown in dark gray) are generated during each turn's output phase, but are not carried forward as input tokens for subsequent turns. You do not need to strip the thinking blocks yourself. The Claude API automatically does this for you if you pass them back."
It's more nuanced in the various modes, but i haven't seen it boil down towards Thinking Tokens surviving more than two turns.
The reasoning may be hidden but the tool calls are not, how else would the client execute them
... what exactly is your threat model? How are "attackers" getting themselves involved in the first place?
You've got that backwards, .bmp is a lossless format and .jpeg is the lossy one.
I thought the reason was the "reasoning" didn't work very well with "aligned" model output, so they had to remove the alignment during reasoning and then hide it to avoid exposing "unaligned" model output.
Before the massive nerf (showing summaries and suppressing certain aspects of reasoning) you would literally see reasoning text appearing on your screen like “while xyz is true, these facts may be seen as supporting hateful rhetoric or a conspiracy theory which is against my policy guidelines. i should tell the user xyz is not true or steer the conversation in a different direction. according to my instructions misleading the user is permitted in certain contexts where sensitive information is being discussed or could cause liability”
They disabled it shortly after the first screenshots appeared online, and restored it the next day in a way that hid what was actually happening.
I think one of the reasons could be to limit liability too.
What if reasoning helps in establishing provenance for questionable sources ?
What if reasoning and model's "thought" points to fundamental issues in how the model was trained to produce certain problematic responses ?
https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-...
It’s quite interesting to read. I can’t imagine using a model like this without the ability to peek inside and see if it is getting stuck.
There's nothing in the reasoning tokens that'll give bad publicity that the final output already wouldn't do.
0: https://wiki.roshangeorge.dev/w/Blog/2025-10-12/Word_Magic#I...?
1: In the sense of true belief, I suppose
Yes, several models think in weird jargon. Here is an example of Mythos's thinking while playing solitaire: https://www.lesswrong.com/posts/wCSEpT3dTGz4N86Wi/even-illeg...
> 7♣-removal-IS-the-prerequisite-for-10♠/9♥!!)-⟹-OVERLAP-(ii)+(iv):-{6♠ J♦ 9♥ 2♣}-=-FOUR--—-UNLESS-7♣'s-seat-8♥-...-and-2♣-drains-only-at-crack-:-⟹-2♣-celled-+-9♥-celled-simultaneously-UNAVOIDABLE-in-t8-dig--—-BREAK:-9♥
This is a small step in the direction of something called "neuralese", where the model has stopped thinking in English and is thinking in internal vector spaces. Since this gets serialized through text, it isn't quite true neuralese, but it's moving in that direction.
I mean, I'm sympathetic towards the models. My internal thought process when writing code uses lots of intermediate steps that would be hard to write out in English.
Fun fact: if you go back to the old school from 2 years ago and provide explicit CoT prompts, you get the full thinking prompts back again!
So you disable thinking altogether, and instead make thinking part of the regular prompt by prompting it:
“Before providing your answer, think step by step. For example:
The use is asking me to… I need to think about the blah blah. First, I should foo the bar, and then blah blah.
Answer: <put your final answer here>”
And tada.wav we have CoT as it worked in the GPT3 era back again.
Still, one of the daily most played WAV files worldwide, Id guess? :-D
https://www.patheos.com/blogs/tippling/2013/11/14/post-hoc-r...
https://www.researchgate.net/publication/316045349_Post_Hoc_...
You are correct in my intentions on this post generally.
I want to highlight:
I want to measure performance of the LLMs over time- which includes assessing the quality of their outputs. I don’t perceive the reasoning output to be anything other than a measurable signal of possible drift in model performance.
Except it isn’t, because I’m only getting a low value summary of the thinking.
It’s like asking your buddy how fast he thought that last pitch was when radar guns are behind the plate.
Yeah, it’s a description related to what happened, but it’s not the thing I want to measure.
It only makes sense that the same mechanism comes into play in strictly-verbal contexts.
Also, this is why "distillation attacks" are largely bullshit that Anthropic spreads for political purposes. Proper distillation requires access to the logits.
Why do you need logits? Can't you just train on cross-entropy loss of the model against the hard decision, like you do in regular pretraining?
There are definitely current-gen open-weight models (Step 3.7 Flash is one) that refer to themselves as an OpenAI model in CoT, but not in the final response.
Pages of “I have to be careful, the user is asking that I do something related to cybersecurity that could easily be turned around and used offensively” but then happily gives me what I wanted.
> preventing misuse.
Imagine not being able to read the tokens you are paying for.
fyi openai does the same; not really surprising or particularly evil
- "Read `description` and create a specification, implementation guide, and checklist." - "Ask clarifying questions. If any of those questions has a clear best recommendation, please select that yourself and record that in "autorecommendations.md". - "Have codex and antigravity review each of these and work to consensus."
These are the core of ~61 lines of prompting I do across 3 prompts, and I feel like the resulting artifacts describe some of the thinking. Also, some of the back-and-forth between the models feels like it gives some insight into the model "thinking".
I will say: I heavily used Fable when it was available; Opus + loops + codex and/or antigravity review is better than Fable at building things.
Mind sharing your prompts?
The LLM providers will clearly evolve to be more and more opaque as their services get more capable. The frontier models may even be provided as purely internal advisor or async only so they can monitor your CoT and final answers for cyber etc.
RL (the basis of LLM "thinking") is a pretty crude way to achieve the appearance of reasoning given that it reinforces all the steps, including missteps, that got it to a reward. Providing a summary could be seen as form of sane-washing, making the model look more purposeful and directed than it really is!
If that is the case thinking is not visible to us as users due to it not being done in text.
Idea somewhat similar to what you describe exist but they make steering/post-training/interpretation much harder.
EDIT:
They link to a Meta paper from 2024/2025 though: https://arxiv.org/pdf/2412.06769/.
I don't know about Claude, but latest GPT versions still have a readable reasoning stream. It sometimes leaks out when the model gets confused, e.g., during a tool call. If you're curious, looks simplified; less words; extremely compact. They optimize tokens. But remain readable.
writes this^ and then proceeds to highlight a bold title from the docs that says "summarized thinking" that explains things clearly in the first sentence. lol
1. make distillation much harder
2. safety: prevent modifications to the thinking leading to injection attacks.
3. also honestly sometimes the model raw thoughts can be deranged and is not a good user experience (consider the varied audience in the market, etc.)
also often the mass underestimate/the model makers over-estimate how people love distilling models
In further reflection it is such a great indignity & such a collosal barrier to working with the machine that it insists on being a black box. The disingenuity of the American models (small print: except AI2 & some other labs; you all are so great) is a massive disadvantage to their use,... and a massive slap in the face.
It's a threat to human intelligence that it is not co-participative. Walking further into my own judgement and feelings: the insistence on being an opaque black box, the Seals Chinese Room, is such a vicious harm to society! This is civilizationally an unsafe form of AI that probably should be outlawed as anti-social. It's an impermissible asymmetry, a crippling dependent relationship to be forced into. I'm working myself up, but here: this.. imo, this is not just indignity, is harmful, it is evil.
This "6 month behind" trend we've seen for open models feels like at some point will be less important than simply the models unwillingness to speak for itself & to be observable.
I suspect that in some decades, as other architectures are found and used, that the inability of an LLM to "think" without also emitting a token will be seen as one of their fundamental limitations.
Humans somewhat do the same - something that's been demonstrated in split-brain experiments.
Because of the nature of how LLMs work — text prediction engines - by putting the explicit reasoning steps first, it improves the likelihood of the final answer (which then is being predicted based on the entire reasoning chain as input) being correct.
This evades an easy yes or no, so:
1. Many consumers believe reasoning-models allow that kind of question to be truthfully-answered, and their belief it reasonable given the marketing going on.
2. Implementers probably do not have the same belief when it comes to the terms mean or what capabilities they imply.
3. Yes, it doesn't actually do what the customer wanted it to do, which is a kind of retrospective introspection of internal thoughts and ideas.
____________
I advocate looking at everything from a document-generation perspective to cut down on traps and cognitive illusions. The "reasoning" models are a change in the style of document being iteratively-grown by the LLM, as opposed to something more anthropomorphized.
* Default: There's just the spoken dialogue between a Human Customer and Helpful Chatbot.
* "Reasoning": There's the spoken dialogue and a bunch of times the Helpful Chatbot character has an internal monologue. This provides more consistency between iterations, and can be mined by custom tools to call external code and insert results.
If your Human Customer character ask "Why did you say that", the LLM does not engage in a different process than "I have eaten an apple."
The LLM has no memories to consult or hidden goals to contemplate, it's the same process of finding more stuff that fits at the end of the document. Any benefits from a "reasoning model" is the LLM generates much better-looking additions because there's more (hidden) stuff for it to confabulate against.
1. https://medium.com/@eshvargb/the-llm-journey-how-neural-netw...
Tell me this. If you hired a junior engineer or designer who refused to explain their thinking on their code and how they solved for the spec what would you do?
(That being said the reasoning output is still a summary of the Kvcache)
Any explanation that someone gives of their thinking process is necessarily lossy and likely partially confabulated.
If it's useful, it's useful, enjoy. If you aren't comfortable with that, don't use LLMs. You aren't going to get a mathematical proof of your output, just learn to be comfortable with that, or opt out and be a goat farmer.
No, they aren't a summary. They are the actual decoding of the sequence of tokens emitted during the the “thinking” stage of response generation.
Just as with, say, a human onner monolog in words vs actual speech, they are a product of the same output process as the non-thinking tokens. They aren’t a translation of the internal process that precedes the output mapped into language, either as a full result or a summary.
Having access to the reasoning text and output would help with performance measurement.
For daily use I actually like the reasoning summary to be brief/quick to scan.
That said, I understand the author’s desire for the real thing. It just feels better to have that access, especially when Anthropic will give it to you, but encrypted.
You cant even guarantee WHAT model you get. Or if they downgrade you. Or if you 'offend corporate sensibilities' and they misdirect or lie.
The only way to get good returns on a model is to run it yourself. Quit paying for corporate bullshit.
> The computation we can see looks like it’s just guessing the answer, despite the chain of thought suggesting it’s computed it using a calculator.
It might be hallucinating or lying, it's not like you are actually observing the internals of the model.
This could all be optics as well to try to give the appearance of a defensible moat. E.g. they can claim to investors that they are able to protect a significant chunk of their intellectual property this way. I'm not sure if anyone has a study about how significant the summarization is to distillation.
In the case of makers of open-source models (which are also competition), there is no allegedly, they were (and still are) openly doing that.
https://en.wikipedia.org/wiki/Economic_moat
Nor does knee jerk accusation of "anthropomorphizing" negate the fact that procedures that mimic human processing, even when done in software, are deservingly anthropomorphized, because they're a legitimate approximation of the human equivalent operations.
Computers don’t think they process, those are very different activities.