Close Menu
    Trending
    • How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1
    • From Training to Drift Monitoring: End-to-End Fraud Detection in Python | by Aakash Chavan Ravindranath, Ph.D | Jul, 2025
    • Using Graph Databases to Model Patient Journeys and Clinical Relationships
    • Cuba’s Energy Crisis: A Systemic Breakdown
    • AI Startup TML From Ex-OpenAI Exec Mira Murati Pays $500,000
    • STOP Building Useless ML Projects – What Actually Works
    • Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025
    • The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Artificial Intelligence»Behind the Magic: How Tensors Drive Transformers
    Artificial Intelligence

    Behind the Magic: How Tensors Drive Transformers

    Team_AIBS NewsBy Team_AIBS NewsApril 26, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Transformers have modified the best way synthetic intelligence works, particularly in understanding language and studying from knowledge. On the core of those fashions are tensors (a generalized sort of mathematical matrices that assist course of info) . As knowledge strikes by means of the completely different elements of a Transformer, these tensors are topic to completely different transformations that assist the mannequin make sense of issues like sentences or photographs. Studying how tensors work inside Transformers may also help you perceive how right this moment’s smartest AI methods really work and suppose.

    What This Article Covers and What It Doesn’t

    ✅ This Article IS About:

    • The circulation of tensors from enter to output inside a Transformer mannequin.
    • Guaranteeing dimensional coherence all through the computational course of.
    • The step-by-step transformations that tensors endure in numerous Transformer layers.

    ❌ This Article IS NOT About:

    • A normal introduction to Transformers or deep studying.
    • Detailed structure of Transformer fashions.
    • Coaching course of or hyper-parameter tuning of Transformers.

    How Tensors Act Inside Transformers

    A Transformer consists of two predominant elements:

    • Encoder: Processes enter knowledge, capturing contextual relationships to create significant representations.
    • Decoder: Makes use of these representations to generate coherent output, predicting every component sequentially.

    Tensors are the basic knowledge constructions that undergo these elements, experiencing a number of transformations that guarantee dimensional coherence and correct info circulation.

    Picture From Analysis Paper: Transformer commonplace archictecture

    Enter Embedding Layer

    Earlier than getting into the Transformer, uncooked enter tokens (phrases, subwords, or characters) are transformed into dense vector representations by means of the embedding layer. This layer capabilities as a lookup desk that maps every token vector, capturing semantic relationships with different phrases.

    Picture by creator: Tensors passing by means of Embedding layer

    For a batch of 5 sentences, every with a sequence size of 12 tokens, and an embedding dimension of 768, the tensor form is:

    • Tensor form: [batch_size, seq_len, embedding_dim] → [5, 12, 768]

    After embedding, positional encoding is added, making certain that order info is preserved with out altering the tensor form.

    Modified Picture from Analysis Paper: Scenario of the workflow

    Multi-Head Consideration Mechanism

    Probably the most crucial elements of the Transformer is the Multi-Head Consideration (MHA) mechanism. It operates on three matrices derived from enter embeddings:

    • Question (Q)
    • Key (Ok)
    • Worth (V)

    These matrices are generated utilizing learnable weight matrices:

    • Wq, Wk, Wv of form [embedding_dim, d_model] (e.g., [768, 512]).
    • The ensuing Q, Ok, V matrices have dimensions 
      [batch_size, seq_len, d_model].
    Picture by creator: Desk displaying the shapes/dimensions of Embedding, Q, Ok, V tensors

    Splitting Q, Ok, V into A number of Heads

    For efficient parallelization and improved studying, MHA splits Q, Ok, and V into a number of heads. Suppose we now have 8 consideration heads:

    • Every head operates on a subspace of d_model / head_count.
    Picture by creator: Multihead Consideration
    • The reshaped tensor dimensions are [batch_size, seq_len, head_count, d_model / head_count].
    • Instance: [5, 12, 8, 64] → rearranged to [5, 8, 12, 64] to make sure that every head receives a separate sequence slice.
    Picture by creator: Reshaping the tensors
    • So every head will get the its share of Qi, Ki, Vi
    Picture by creator: Every Qi,Ki,Vi despatched to completely different head

    Consideration Calculation

    Every head computes consideration utilizing the components:

    As soon as consideration is computed for all heads, the outputs are concatenated and handed by means of a linear transformation, restoring the preliminary tensor form.

    Picture by creator: Concatenating the output of all heads
    Modified Picture From Analysis Paper: Scenario of the workflow

    Residual Connection and Normalization

    After the multi-head consideration mechanism, a residual connection is added, adopted by layer normalization:

    • Residual connection: Output = Embedding Tensor + Multi-Head Consideration Output
    • Normalization: (Output − μ) / σ to stabilize coaching
    • Tensor form stays [batch_size, seq_len, embedding_dim]
    Picture by creator: Residual Connection

    Feed-Ahead Community (FFN)

    Within the decoder, Masked Multi-Head Consideration ensures that every token attends solely to earlier tokens, stopping leakage of future info.

    Modified Picture From Analysis Paper: Masked Multi Head Consideration

    That is achieved utilizing a decrease triangular masks of form [seq_len, seq_len] with -inf values within the higher triangle. Making use of this masks ensures that the Softmax operate nullifies future positions.

    Picture by creator: Masks matrix

    Cross-Consideration in Decoding

    For the reason that decoder doesn’t absolutely perceive the enter sentence, it makes use of cross-attention to refine predictions. Right here:

    • The decoder generates queries (Qd) from its enter ([batch_size, target_seq_len, embedding_dim]).
    • The encoder output serves as keys (Ke) and values (Ve).
    • The decoder computes consideration between Qd and Ke, extracting related context from the encoder’s output.
    Modified Picture From Analysis Paper: Cross Head Consideration

    Conclusion

    Transformers use tensors to assist them study and make good choices. As the info strikes by means of the community, these tensors undergo completely different steps—like being changed into numbers the mannequin can perceive (embedding), specializing in necessary elements (consideration), staying balanced (normalization), and being handed by means of layers that study patterns (feed-forward). These modifications maintain the info in the suitable form the entire time. By understanding how tensors transfer and alter, we are able to get a greater concept of how AI models work and the way they’ll perceive and create human-like language.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleMachine Learning Periodic Level. Computer / AI သမားတွေ အနေနဲ့ သုတေသန… | by Myo Thida | Introduction to Deep Learning | Apr, 2025
    Next Article 7 AI Tools That Help You Build a One-Person Business — and Make Money While You Sleep
    Team_AIBS News
    • Website

    Related Posts

    Artificial Intelligence

    How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

    July 1, 2025
    Artificial Intelligence

    STOP Building Useless ML Projects – What Actually Works

    July 1, 2025
    Artificial Intelligence

    Implementing IBCS rules in Power BI

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    How to Succeed as a People-Driven Leader

    February 26, 2025

    Three Important Pandas Functions You Need to Know | by Jiayan Yin | Dec, 2024

    December 25, 2024

    How mines control driverless trucks

    December 16, 2024
    Our Picks

    How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

    July 1, 2025

    From Training to Drift Monitoring: End-to-End Fraud Detection in Python | by Aakash Chavan Ravindranath, Ph.D | Jul, 2025

    July 1, 2025

    Using Graph Databases to Model Patient Journeys and Clinical Relationships

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.