Right here’s the factor with dense options. Practice a mannequin too lengthy, particularly a big one, and your patch-wise options begin getting bizarre. Noisy. Over-smooth. Generally they only collapse.
To cease this, Meta launched one thing known as Gram Anchoring.
What’s Gram Anchoring?
It’s a brand new form of loss operate that forces the construction of similarities between patch options to remain steady throughout lengthy coaching. Mainly, the mannequin compares its present patch similarities to these from an earlier, extra constant checkpoint. It doesn’t care if the options drift somewhat, so long as the relationships between patches keep clear.
This one trick fixes the characteristic degradation that hit DINOv2 and different SSL fashions. And it unlocks long-form coaching, even on 7B parameter behemoths.
Bonus: In addition they tried a high-resolution model of Gram Anchoring the place the trainer makes use of larger enter photos. That additional smooths out patch inconsistencies.