Within the iconic Tremendous Mario video games, the hero features new skills not via follow or repetition, however just by touching the fitting power-up. A fireplace flower lets him hurl fireballs. A star makes him invincible. One contact, and Mario is remodeled. What if synthetic intelligence may do the identical?
This pleasant metaphor is greater than whimsy — it completely captures the essence of a groundbreaking new paper from ICML 2024. Titled “Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch”, it presents a radical thought: language fashions can acquire new capabilities by “absorbing” different fashions without having retraining, additional knowledge, and even GPUs. In a subject the place progress is usually measured in thousands and thousands of compute hours, this discovery lands like a hearth flower.
Why This Work Feels Like Magic
Historically, if we would like a language mannequin to observe directions, resolve math issues, and write code, we should fine-tune it individually for every activity. This implies a number of coaching runs, huge computational prices, and cautious curation of coaching knowledge. Every functionality turns into a silo, remoted in its personal mannequin, every optimized for one slim function.