Apollo and Design Choices of Video Large Multimodal Models (LMMs) | by Matthew Gunton

Let’s discover main design selections from Meta’s Apollo paper

As we’ve been anticipating, fashions have gotten more and more able to understanding several types of inputs. We’ve seen picture transformer fashions (see my blogs on fine-tuning Flux and the research behind MM1) and now we’re starting to see video fashions hit the scene.

In December of 2024, Meta unveiled their new Apollo household of fashions. Once they unveiled these, in addition they revealed a paper detailing their analysis and work round Giant Multimodal Fashions (LMMs). The paper is filled with nice particulars, so quite than attempt to cowl all of it I’ll be specializing in the 4 main design selections they highlighted when making their mannequin.

Let’s dive in!

Embedding

Let’s first format some fast concepts which are vital to know what’s happening right here. Each Transformer depends on embeddings for its enter. Nonetheless, consumer enter is often first transformed from one thing user-understood (textual content, movies) to tokens after which embeddings. To transform to embeddings, we use an embedding mannequin. For multi-modal inputs, we usually use a distinct encoder for every enter kind.

Source link

Implementing IBCS rules in Power BI

Become a Better Data Scientist with These Prompt Engineering Tips and Tricks

Lessons Learned After 6.5 Years Of Machine Learning

Implementing IBCS rules in Power BI

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

NetApp’s 2024 Data Complexity Report Reveals AI’s Make or Break Year Ahead

《從零開始的資料科學筆記》Day#9: 特徵工程. 🙋什麼是特徵? | by Ethan Chen | Jun, 2025

Understanding Random Forest & Naïve Bayes (Classifier) | by Alvin Octa Hidayathullah | Feb, 2025

Our Picks

Implementing IBCS rules in Power BI

What comes next for AI copyright lawsuits?

Why PDF Extraction Still Feels LikeHack

Apollo and Design Choices of Video Large Multimodal Models (LMMs) | by Matthew Gunton | Jan, 2025

Let’s discover main design selections from Meta’s Apollo paper

Embedding

Related Posts