ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
100% Positive
Analyzed from 798 words in the discussion.
Trending Topics
#manifold#layers#middle#level#same#transformer#training#autoencoders#input#model

Discussion (19 Comments)Read Original on HackerNews
It seems that the input layers to a Transformer are necessarily going to be doing the most low level work of syntax -> semantic augmentation starting with things like tagging parts of speech etc. Similarly the output layers are by necessity going to be concerned with mapping high level representations back into surface level word sequence form. This leaves the middle layers to do the work of first recognizing deep enough patterns to support good quality prediction, then do the high level predication itself which is what RL is typically going to be trying to shape.
But for the tasks this paper uses for RL training, it's all about improving the way the net is manipulating concepts. So the middle layers are where the focus should be.
Note: RL is also used for tasks that aren't about conceptual manipulation, like instruct training. I bet that their result doesn't hold for that because the delta vs the foundation model is all about the selection of words and flow of the text, not the core understanding.
https://dnhkng.github.io/posts/rys/
Feels it should be straightforward to integrate in LLMs a network to control the looping. Or just duplicate entire blocks of layers after the initial training.
RL post-training alters the parameters of the transformer, while your f(manifold) idea seems to suggest that a new layer on top would suffice, no need to alter the transformer itself at all.
It would be extremely handy if that were so, but I'm guessing it isn't, or it would be the prevailing approach.
Worth noting a different manifold "exists" after each transformation (e.g. layer). You only sample from the same manifold when you apply the same transformation(s).
Most errors are probably responses that didn’t finish before their 3K token limit. They’ve measured how well RL is able to shorten the response to their limit.
[0] not simply
So much left on the table