Back to News
Advertisement
Advertisement

⚑ Community Insights

Discussion Sentiment

50% Positive

Analyzed from 172 words in the discussion.

Trending Topics

#more#information#entropy#anyone#wonder#models#collapse#useful#language#bits

Discussion (6 Comments)Read Original on HackerNews

estebarbβ€’1 minute ago
That sounds very similar to what we known in self-supervised learning to representation collapse. Wonder if we could copy some of the anti collapse mechanisms from SSL into GPT... after all, they are ways to increment the differential entropy. However, I'm not sure if it could be useful after all: any pure function cannot produce more entropy than the entropy it receives... and natural language as text has much less entropy than other domains...
aetherspawnβ€’about 4 hours ago
It makes sense to me that distributing across more parameters results in models that can be quant more heavily (information theory - more bits available)

I wonder if anyone has figured out how the information is compressed and calculated the amount of information an LLM can hold depending on its size

woadwarrior01β€’about 3 hours ago
> I wonder if anyone has figured out how the information is compressed and calculated the amount of information an LLM can hold depending on its size

You might want to look at Physics of Language Models[1]. IIRC, the authors estimate it to be ~2 bits of factual knowledge per parameter.

[1]: https://physics.allen-zhu.com/

lwansbroughβ€’about 4 hours ago
Anyone with a billion dollars want to try this and report back?
nullcβ€’about 4 hours ago
From the paper it appears that it's probably more useful on small-ish models.
lwansbroughβ€’about 2 hours ago
What does it cost to train a model like 1-bit Bonsai? Anyone know?