FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
100% Positive
Analyzed from 302 words in the discussion.
Trending Topics
#lot#context#learn#data#genome#information#efficiency#learning#entirely#able

Discussion (1 Comments)Read Original on HackerNews
Instead of chasing some kind of holy grail that would alchemically make it learn complex abstractions from insufficient data, I suggest finding ways to make that stepped ladder from bulk data to pretraining to post-training to in-context learning to be a gradient, and to be a lot deeper.
>Many billions of years of evolution is our pre-training
Not just that, but also compression of knowledge happening in the society. Which is the majority of "our" intelligence. Let's also not forget that we are fairly specialized, and you're actually comparing against the entire society, a huge generational system, not a single individual. Pure biological capabilities of a single individual are pretty mediocre, if you look at a caveman you'll find his efficiency to be a lot less impressive despite having the same biological capacity. For example your ability to learn new math entirely depends on your prior math knowledge, which is a result of a humongous amount of gradient descent performed by generations of mathematicians with ever-increasing rate and abstraction complexity. A caveman won't be able to learn it that easily.
>Our genome is 3GB, about 1-2% protein coding. That is just not enough space to store the model parameters that are supposedly pretrained
1. A lot more information is encoded in implicit ways: our entire environment, our parents (genome can't reproduce itself!), and only then the genome.
2. Evolution had a lot more "compute" to spend on cramming the same stuff into a lot fewer dimensions.
>These comparisons are not including the multimodal data we see in our lifetimes. If you include all this sensory information, we’re probably in the 10s to 100s of billions of tokens range from birth to adulthood
What is this number, where does it come from? Tokenization is an artifact of current architectures, a step on that discrete compression ladder. Latent throughput of the brain is in the range of double digit bits per second, this is already pushing it to the max, actual amount of the information is a lot less.