We have already got LLMs that deal with large contexts (Gemini’s round 2M tokens, if I’m not unsuitable). That’s cool, however let’s be sincere — longer doesn’t all the time imply higher. Generally, longer contexts simply imply much less consideration to element.
Titans suggest some type of long-term reminiscence (yeah, LSTMs, I really like seeing you once more!) that learns at check time.
Let me repeat that: at check time!
The mannequin dynamically identifies components of the context and flags them as related or not — so it is aware of what to recollect. It does this by means of a intelligent metric the authors outlined: “Shock” — which mainly measures how a lot a bit of context adjustments over time. The extra it surprises the mannequin, the extra consideration it will get.