Different Language Models Learn Similar Number Representations

AAnon84 about 4 hours ago 27 commentsRead Article on arxiv.org

⚡ Community Insights

Discussion Sentiment

40% Positive

Analyzed from 849 words in the discussion.

Discussion (27 Comments)Read Original on HackerNews

zjp•about 1 hour ago

Different models, similar number representations. Different models for different languages, similar concept representations. They have to learn all of this from human text input, so they're not divining it themselves. It all makes a strong case for universal grammar, IMO.

MarcelOlsz•19 minutes ago

I refuse to learn esperanto, sorry.

williamcotton•31 minutes ago

> It all makes a strong case for universal grammar, IMO.

What about through the lens of the Norvig-Chomsky debate?

causal•about 3 hours ago

Title is editorialized and needs to be fixed; the paper does not say what this title implies, nor is that the title of the paper.

wongarsu•about 2 hours ago

HN automatically removes the word "How" from the beginning of titles. I suspect this title is one instance of that

causal•about 1 hour ago

Unfortunate if so, but I'm finding plenty of counterexamples in the past day alone: https://hn.algolia.com/?dateRange=last24h&page=0&prefix=true...

btilly•about 2 hours ago

The exact phrase appears in the title. There is a title length limit. In this case, I don't think that it is wrong to pick the most interesting piece of that title that fits in the limit.

matja•about 3 hours ago

The eigenvalue distribution looks somewhat similar to Benford's Law - isn't that expected for a human-curated corpus?

btilly•about 2 hours ago

I would expect that for any sampling of data that has a roughly similar distribution over many scales.

Which will be true of many human curated corpuses. But it will also be similar to, for natural data as well. Such as the lengths of random rivers, or the brightness of random stars.

The law was first discovered because logarithm books tended to wear out at the front first. That turned out to because most numbers had a small leading digit, and therefore the pages at the front were being looked up more often.

jdonaldson•about 3 hours ago

(Pardon the self promotion) Libraries like turnstyle are taking advantage of shared representation across models. Neurosymbolic programming : https://github.com/jdonaldson/turnstyle

fmbb•about 1 hour ago

> Language models trained on natural text learn to represent numbers using periodic features with dominant periods at T=2,5,10.

This proves a decimal system is correct. Base twelve numeral systems are clearly unnatural and inefficient.

aljgz•about 1 hour ago

This is just a result of base 10 being dominant in our natural languages. I assume if we really used base 12, things would be different.

What would using base 12 in our natural language mean? Number names needed to be based on 12, not 10. Thirteen, twenty-seven, our numbers have base 10 embedded in their naming.

dunham•18 minutes ago

Historically, quite a few languages were (or are) vigesimal. Perhaps decimal is also unnatural.

ACCount37•about 4 hours ago

The "platonic representation hypothesis" crowd can't stop winning.

Potentially useful for things like innate mathematical operation primitives. A major part of what makes it hard to imbue LLMs with better circuits is that we don't know how to connect them to the model internally, in a way that the model can learn to leverage.

Having an "in" on broadly compatible representations might make things like this easier to pull off.

causal•about 3 hours ago

You seem to be going off the title which is plainly incorrect and not what the paper says. The paper demonstrates HOW different models can learn similar representations due to "data, architecture, optimizer, and tokenizer".

"How Different Language Models Learn Similar Number Representations" (actual title) is distinctly different from "Different Language Models Learn Similar Number Representations" - the latter implying some immutable law of the universe.

dnautics•about 2 hours ago

> latter implying some immutable law of the universe

I think the implications is slightly weaker -- it implies some immutable law of training datasets?

FrustratedMonky•about 3 hours ago

Same with images maybe?

Saw similar study comparing brain scans of person looking at image, to neural network capturing an image. And were very 'similar'. Similar enough to make you go 'hmmmm, those look a lot a like, could a Neural Net have a subjective experience?'

ACCount37•about 1 hour ago

"Subjective experience" is "subjective" enough to be basically a useless term for any practical purpose. Can't measure it really, so we're stuck doing philosophy rather than science. And that's an awful place to be in.

That particular landmine aside, there are some works showing that neural networks and human brain might converge to vaguely compatible representations. Visual cortex is a common culprit, partially explained by ANN heritage perhaps - a lot of early ANN work was trying to emulate what was gleaned from the visual cortex. But it doesn't stop there. CNNs with their strong locality bias are cortex-alike, but pure ViTs also converge to similar representations to CNNs. There are also similarities found between audio transformers and auditory cortex, and a lot more findings like it.

We don't know how deep the representational similarity between ANNs and BNNs runs, but we see glimpses of it every once in a while. The overlap is certainly not zero.

Platonic representation hypothesis might go very far, in practice.

LeCompteSftware•about 3 hours ago

"using periodic features with dominant periods at T=2, 5, 10" seems inconsistent with "platonic representation" and more consistent with "specific patterns noticed in commonly-used human symbolic representations of numbers."

Edit: to be clear I think these patterns are real and meaningful, but only loosely connected to a platonic representation of the number concept.

ACCount37•about 2 hours ago

Is it an actual counterargument?

The "platonic representation" argument is "different models converge on similar representations because they are exposed to the same reality", and "how humans represent things" is a significant part of reality they're exposed to.

brentd•about 2 hours ago

Regardless of whether the convergence is superficial or not, I am interested especially in what this could mean for future compression of weights. Quantization of models is currently very dumb (per my limited understanding). Could exploitable patterns make it smarter?

ACCount37•about 2 hours ago

That's more of a "quantization-aware training" thing, really.

dboreham•about 3 hours ago

It's going to turn out that emergent states that are the same or similar in different learning systems fed roughly the same training data will be very common. Also predict it will explain much of what people today call "instinct" in animals (and the related behaviors in humans).

ACCount37•about 1 hour ago

Evolution is an optimization process. So if platonic representation hypothesis holds well enough, there might be some convergence between ML neural networks and evolved circuits and biases in biological neural networks.

I'm partial to the "evolved low k-complexity priors are nature's own pre-training" hypothesis of where the sample efficiency in biological brains comes from.

panagathon•about 3 hours ago

Oh yeah, that's clever

gn_central•about 4 hours ago

Curious if this similarity comes more from the training data or the model architecture itself. Did they look into that?

OtherShrezzing•about 4 hours ago

They describe that both are important, and researched in the paper, within the opening paragraph.