ES version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
81% Positive
Analyzed from 2391 words in the discussion.
Trending Topics
#model#models#more#llms#thinking#concept#next#thing#gemma#don

Discussion (43 Comments)Read Original on HackerNews
Likewise with Claude referring to "the model" - that quote sounds like something an Anthropic worker would say. Seems like a pithy little line Claude could have learned "on the job."
But, also, Gemma 4 is really surprising on a bunch of fronts. It loses to Qwen 3.6 on most benchmarks, but in my testing it behaves quite beyond what I would expect of a very small model on a bunch of fronts. It feels really smart, in a general way, that I don't get from most models short of the frontier. Google is still, I think, a leading AI research company, if not the leading AI research company, despite their top models being kinda ass compared to Opus 4.8 or GPT 5.5. They're focused on efficiency and cramming a ridiculous amount of capability into tiny models. Gemma 4 12B is the best vision model, by far, until well past anything I can self-host (it beats 120B models in my tests). For finding security bugs, giving it a bunch of opportunities to find the bug results in it being competitive with the best I've tested, as well. Google is playing a different game that isn't "make the best Claude Code competitor". I'm not sure I understand exactly what game they're playing, but there are clearly some really smart AI engineers at Google.
https://swelljoe.com/post/gemma-4-exceeds-expectations/
The current interface to LLMs are heavily biased towards "predict the next token in the context of a user with a helpful assistant" but LLMs are capable of other modes of next token prediction too.
Before the ChatGPT release people often measured LLM performance by how well they could produce a coherent story or a poem. that's where Anthropic model names are originating from I am guessing.
It's pretty clear to me that above a certain size threshold, LLMs are more than a sum of their parts. The sheer amount of training data seems to embed a higher level of reasoning.
“There cannot be any reasoning embedded in the model” a strong statement, what do you mean by reasoning because by any reasonable definition I’m aware of, they clearly are able to exhibit reasoning.
The fact that the pre training objective is next token loss has nothing to do with capabilities or their ability to reason. To be highly successful at next token prediction you NEED to reason. I’m quite confused here.
That's confusing the training objective with the learned behavior. It's like saying "Stockfish's algorithm is literally 'minimize this number', and therefore, it can't actually play Chess."
> Wait, I noticed a pattern in my previous responses: I had some weird typos/letter additions ('sgreat', 'askinsg'). Actually, wait — did I do that on purpose or was it a glitch?
A person who has no idea what an LLM is would likely fall into this "trap"
What I don't know quite as much about is how cognition works in biological computers - and I suspect you know just as little as most of the rest of us do in that regard! So I think it's not entirely appropriate to make sweeping claims about what artificial neural networks, fundamentally, can and cannot do. Most of what we can do is poke and prod at them and see what happens, which is exactly what this piece is about.
I know very well that this is kind of off-topic, and just like the author, i do not claim to know wether dogs (or any other non-human animal for that matter) is self-aware, and again, just like the author, i do think that the question cannot be answered. Either way, the modified version of their scent seemed more interesting to the dogs, maybe it's because they smell their own scent all the time. The single fact that their modified scent is more interesting to them does not mean they are self-aware, perhaps they are just trying something new.
Given the framing that they're similar to nukes and a national security issue, it's likely that the models are post trained to not answer such questions accurately.
Also the article could be trying to normalize thinking that these are more than matrix multiplication gadgets good at compression.
Honestly, I think it's less so (for some of us) that we think they're "more than matrix multiplication gadgets good at compression", so much as thinking that perhaps what our brains are doing is not so dissimilar.
A materialist view of the world could support the idea that intelligence itself may just be a series of predictions from a big compressed multi-modal dataset. That's not to say that LLMs are doing it in a way that is even close to how our brains are doing it, but we also don't understand how different it may be, and how much utility we can get out of them even with the current architecture.
Mechanistic interpretability research has found plenty of indicators that real, complex, generalized, and reusable circuits develop in models as they are trained and post-trained, particularly as overtraining ratios increase and memorization shifts to generalization. That's not to say that means they must be "conscious," but the overall point is that claiming anything definitive either way is incomplete.
It can be fascinating reading if you can sort through the chuff.
LLMs are not capable of this kind of reflection.
It's also the reason why I ran the two tests on open weights models with unredacted thinking traces. Gemma never flagged anything in its response either, only in its thinking. Without knowing how the summarizer models are prompted, it's impossible to tell whether it was a genuine miss or just something the summarizer decided to omit.
This is true for instruction-tuned models; but instruction tuning is late in the training process.
A bit like assessing a person’s self-awareness based on their high-school knowledge.
> *post-training* installs a self-model with actual, meaningful boundaries, and when processing falls outside those boundaries, the first-person pronoun no longer binds to the content.
But you're right I could've been more explicit about it.
Detection of errors injected into context is useful but I think it’s a different thing.
Should be better now.
If there is some sort of feedback loop (model has a reason to look into mirror), it usually does notice.
https://www.anthropic.com/research/introspection
TLDR; Part 1: Testing introspection with concept injection
First they find neural activity patterns they attribute to certain concepts by recording the model’s activations in specific contexts (so for example, they find the concept of "ALL CAPS" or "dogs"). Then they inject these patterns into the model in an unrelated context, and ask the model whether it notices this injection, and whether it can identify the injected concept.
By default (no injection), the model correctly states that it doesn’t detect any injected concept, but after injecting the “ALL CAPS” vector into the model, the model notices the presence of the unexpected concept, and identifies it as relating to loudness or shouting. Most notably, the model recognizes the presence of an injected thought immediately, before even mentioning/utilizing the concept that was injected (i.e it won't start writing in all caps then go, 'Oh you injected all caps' and so on) so it does not simply deduce this it's own output. They repeat this for several other concepts.
Part 2: Introspection for detecting unusual outputs
They prefill an out of place word in the model's response to a given prompt. For example, 'bread'. Then they compare how the models responds to 'Did you mean to say this?' type questions when they inject the concept of bread vs when they don't. They found that models will go , 'Sorry, that was unintentional..' when the concept was not injected but try to confabulate a reason for saying the word when the concept was injected.
Part 3: Intentional control of internal states
They show that models exhibit some level of control over their own internal representations when instructed to do so. When instructing models to think about a given word or concept, they found much higher corresponding neural activity than when told the model not to think about it (though notably, the neural activity in both cases exceeds baseline levels–similar to how it’s difficult, when you are instructed “don’t think about a polar bear,” not to think about a polar bear!).
Notes and Caveats
- Claude Opus 4.1 was the best at these kinds of introspection.
- There is obviously a genuine capacity to monitor and control their own internal states, but they could not elicit these introspection abilities all the time. Even using their best injection protocol, Claude Opus 4.1 only demonstrated this kind of awareness about 20% of the time.
- There are some guesses, but no explanations for the mechanisms of introspection and how/why some of these abilities might have arisen in the first place.
It's not an exact fit because the output is that of a tool rather than the model itself, but it was the first time I began to realize that just like the brain, these models have an expectation of reality that they work around. They don't necessarily 'trust' an output if it diverges significantly from this 'reality'. And that this disregard may be silent indeed.
GPT-3 will ignore tools when it disagrees with them - https://vgel.me/posts/tools-not-needed/