Donostia, Spain – April 8, 2025 – Multiverse Computing at the moment launched two new AI fashions compressed by CompactifAI, Multiverse’s AI compressor: 80 p.c compressed variations of Llama 3.1-8B and Llama 3.3-70B.
Each fashions have 60 p.c fewer parameters than the unique fashions, 84 p.c better power effi ciency, 40 p.c sooner inference, and yield a 50 p.c price discount with out sacrifi cing accuracy, in response to Multiverse. “AI builders can instantly plug the fashions into any utility – edge, on-premise, or cloud,” the corporate stated.
Multiverse will launch variations of the highest LLMs compressed by CompactifAI over the approaching months.
“Meta’s Llama 4 launch underscores a serious shift in AI: smaller, extra highly effective, and multimodal fashions are not non-obligatory — they’re the brand new default,” stated Dmitry Zakharchenko, chief software program workplace at Blaize, a U.S. edge AI chip firm. “As AI strikes from cloud to edge, success is determined by fashions which are environment friendly, inexpensive, and absolutely programmable.”
Multiverse stated CompactifAI is the fi rst compressor of its form, utilizing quantum-inspired tensor networks to make AI methods extra effi cient and moveable, decreasing dimension as much as 93 p.c with solely a 2-3 p.c drop in accuracy—an astounding feat when in comparison with an industry-standard 20-30% accuracy loss with 50-60 p.c compression methods.
“CompactifAI is altering the economics of AI processing and opening up new use circumstances for AI fashions,” stated Enrique Lizaso Olmos, CEO of Multiverse Computing. “Eff orts to curb unwieldy fashions have come up quick. Our novel strategy to compression grounded in quantum-inspired methods makes it attainable to pair efficiency with processing effi ciency and offers us an enormous edge on LLM suppliers.”
Multiverse Computing was based in 2019 by pioneers in quantum-inspired software program to develop novel options to advanced enterprise issues. In 2023 the corporate started making use of its core expertise to deal with the AI power disaster with CompactifAI.
LLM suppliers have turned to methods reminiscent of pruning and quantization to compress fashions however have but to eradicate the tradeoff between dimension and efficiency. As an illustration, Llama3.1-8B Slim by CompactifAI requires 300x fewer coaching tokens than Meta’s CAI Llama3, and 3x fewer coaching tokens than Nvidia’s Llama3.1-Minitron whereas outperforming throughout benchmarks. For Llama3.3-70B Slim by CompactifAI, comparative benchmarks present a rise in reasoning capabilities whereas sustaining unique precision.
“We’re quickly delivering compressed variations of probably the most highly effective LLMs on the earth,” stated Sam Mugel, Chief Expertise Offi cer at Multiverse. “The superior capabilities of those two huge fashions can now fi t into smartphones, laptops, and automobiles, or real-world machines like oil rigs and satellites. Our aggressive roadmap to roll out dozens of compressed, main LLMs might dramatically speed up the affect of AI in the actual world.”