Close Menu
    Trending
    • STOP Building Useless ML Projects – What Actually Works
    • Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025
    • The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z
    • Musk’s X appoints ‘king of virality’ in bid to boost growth
    • Why Entrepreneurs Should Stop Obsessing Over Growth
    • Implementing IBCS rules in Power BI
    • What comes next for AI copyright lawsuits?
    • Why PDF Extraction Still Feels LikeHack
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»FragmentStream Attention: Training a Transformer in Budget | by Yash Rawal | Feb, 2025
    Machine Learning

    FragmentStream Attention: Training a Transformer in Budget | by Yash Rawal | Feb, 2025

    Team_AIBS NewsBy Team_AIBS NewsFebruary 10, 2025No Comments11 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Have you ever ever questioned how giant language fashions (LLMs) like GPT and Llama really work? Certain, you should use pre-trained fashions with only a few strains of code, however the true problem is knowing what’s taking place behind the scenes. I made a decision to discover — it’s complicated and irritating, however completely potential.

    My journey started whereas engaged on firm undertaking by which I realized to make the most of the pretrained Llama 3.1 mannequin and my curiosity begins from there like How is it working?, How is it been developed?, Can I make one thing like this?

    Whereas studying about it like several genius particular person, I stumbled upon this well-known analysis paper, ‘Attention is all you need,’ after studying this and understanding the Transformer Structure.

    ‘I began my very own daring Experimentation: a lean, imply 3-million-parameter mannequin, educated for 20 hours on Kaggle’s free P100 GPU. Proof that you just don’t want a supercomputer to chase huge concepts — simply dedication, curiosity, and a touch of resourcefulness!’

    I started by creating a easy character-level mannequin that solely required 80 characters as tokens and 1,000 strains of dataset. As I improved it, I ended up creating this mannequin utilizing strategies like byte-pair encoding for phrase and sub phrase (like suffix and prefix) primarily based tokenization and even stumbled upon some stunning discoveries, which I’ll focus on later in future articles. Across the identical time, after I first began studying about Transformers, I used to be amazed by their energy but additionally pissed off by how a lot reminiscence they consumed it was like constructing a sandcastle which is washed away by the waves time and again!

    This frustration led me to discover methods to make Transformers extra memory-efficient, ultimately resulting in the concept of “FragmentStream Consideration”.

    However earlier than diving into the main points, let’s first discover why reminiscence issues.

    Now Transformer is a sophisticated method which is dominating the sector of NLP and Language fashions as a result of they can perceive lengthy sequences and acknowledge patterns on this giant textual content knowledge .

    It’s easy the extra context and data base you wish to retailer the extra reminiscence you’ll want.

    Now this transformers are very good they assist you to with duties like translating languages, writing tales and possibly articles too. They will do that as a result of they’ve one thing known as “consideration” which permit them to learn solely most vital half and that’s why they use lot of reminiscence whereas coaching.

    What’s the Drawback With Conventional Consideration?

    Within the conventional Transformers, consideration works by evaluating each phrase to each different phrase in a sentence. Meaning they make an enormous grid, like an enormous desk, to maintain observe of how vital every phrase is in comparison with all of the others. This desk grows actually, actually huge if the sentence is just too lengthy.

    Why This Is a Drawback:

    1. It makes use of an excessive amount of reminiscence: The grid will get greater and larger because the textual content will get longer. If the textual content is 1,000 phrases, the grid is 1,000 x 1,000! That’s HUGE.
    2. It’s gradual: Transformers must fill in each field within the grid, which takes quite a lot of time.

    Let’s take a look at an instance in code:

    # Conventional Consideration (simplified)
    B, T, C = x.form # B=batch dimension, T=sequence size, C=dimensions
    q = self.question(x) # (B, T, C)
    okay = self.key(x) # (B, T, C)

    # Retailer ALL consideration scores directly!
    attention_scores = q @ okay.transpose(-2, -1) # (B, T, T) - That is enormous!
    consideration = softmax(attention_scores) @ v # Extra reminiscence utilization
    # Think about T is 1,000—that is 1,000 x 1,000 = 1,000,000 bins!

    When you’ve ever tried to hold too many groceries directly, you know the way arduous it’s. That’s what occurs to Transformers when the grid is just too huge they drop every thing.

    Now, let’s discuss how FragmentStream Consideration fixes this! Now what i did is solely divided the method into batches. As an alternative of trying on the entire e book directly, it splits the e book into small chunks, or fragments, and works on one piece at a time.

    Think about studying one web page of a e book, writing down the vital stuff, after which transferring on to the following web page. Upon getting learn all of the pages, you place all of the notes collectively the notes from every web page to type an entire understanding of all the e book. This step-by-step strategy ensures nothing is ignored whereas protecting the method environment friendly. That’s what FragmentStream Consideration does.

    Key Concepts Behind FragmentStream Consideration:

    1. Break the textual content into items: It divides the textual content into smaller components (like 128 phrases at a time).
    2. Deal with one half at a time: It solely compares the phrases inside each bit as an alternative of trying on the entire textual content directly.
    3. Mix the outcomes: After engaged on every half, it provides every thing collectively to get the ultimate reply.
    4. Maintain it organized: It nonetheless remembers the order of the textual content so every thing is smart.

    That is the way it seems like in python code:

    # FragmentStream_Attention implementation (simplified)

    fragment_size = 128 # Course of 128 tokens at a time
    for i in vary(0, T, fragment_size): # Course of queries in fragments
    q_fragment = q[:, i:i+fragment_size] # Take small group of queries
    for j in vary(0, T, fragment_size): # Course of keys/values in fragments
    k_fragment = okay[:, j:j+fragment_size] # Take small group of keys
    v_fragment = v[:, j:j+fragment_size] # And corresponding values
    # Examine solely these small fragments
    scores = q_fragment @ k_fragment.transpose(-2, -1)
    # Course of and accumulate outcomes

    And that is how I think about it really works on {hardware}:

    [Full Matrix in Memory]      [fragment 1] [Clean Up] [fragment 2] [Clean Up]
    X X X X X X X X X X ➜ X X X ➜ X X X ➜
    X X X X X X X X X X ➜ X X X ➜ X X X ➜
    X X X X X X X X X X ➜ X X X ➜ X X X ➜
    X X X X X X X X X X ➜ X X X ➜ X X X ➜

    Yeah! I do know It might sound humorous but it surely make important modifications.

    And that is how I utilized this in my mannequin:

    class FragmentStream_Attention(nn.Module):
    """
    Initialize FragmentStream Consideration module

    Args:
    - head_size: Dimensionality of consideration heads
    - block_size: Most sequence size
    - dropout: Regularization charge
    - fragment_size: Dimension of textual content fragments to course of (default 128 tokens)
    """
    def __init__(self, head_size, block_size, dropout, fragment_size=128):
    tremendous().__init__()
    self.head_size = head_size
    self.fragment_size = fragment_size
    self.dropout = nn.Dropout(dropout)
    self.register_buffer('tril', torch.tril(torch.ones(block_size, block_size)))

    def ahead(self, q, okay, v):
    B, T, C = q.form

    # Initialize output tensor
    out = torch.zeros_like(v)

    # Course of consideration in fragments to avoid wasting reminiscence
    for i in vary(0, T, self.fragment_size):
    j_start = i
    j_end = min(T, i + self.fragment_size)

    # Present fragment of queries
    q_fragment = q[:, i:j_end]

    # Calculate consideration scores for this fragment
    attn_weights = torch.zeros(B, j_end-i, T, system=q.system)

    for j in vary(0, T, self.fragment_size):
    k_fragment = okay[:, j:min(T, j + self.fragment_size)]

    # Compute consideration scores for this block
    scores = (q_fragment @ k_fragment.transpose(-2, -1)) * (C ** -0.5)

    # Apply causal masks
    scores = scores.masked_fill(
    self.tril[i:j_end, j:min(T, j + self.fragment_size)] == 0,
    float('-inf')
    )

    attn_weights[:, :, j:min(T, j + self.fragment_size)] = scores

    # Softmax over all the sequence size
    attn_weights = F.softmax(attn_weights, dim=-1)
    attn_weights = self.dropout(attn_weights)

    # Compute weighted sum of values in fragments
    for j in vary(0, T, self.fragment_size):
    v_fragment = v[:, j:min(T, j + self.fragment_size)]
    out[:, i:j_end] += attn_weights[:, :, j:min(T, j + self.fragment_size)] @ v_fragment

    return out

    How FragmentStream Consideration Works Inside

    Let’s look deeper into what occurs when FragmentStream Consideration reads the textual content:

    Step-by-Step Rationalization:

    1. Break the textual content into chunks: If the textual content has 1,000 phrases, and every fragment can maintain 128 phrases, it splits the textual content into about 8 components.
    2. Examine inside every fragment: It seems on the phrases in every fragment to determine which of them are vital.
    3. Write down the outcomes: After engaged on every fragment, it writes down the vital issues.
    4. Put every thing collectively: On the finish, it combines the outcomes from all of the fragments to get the complete reply.

    By doing this neatly, it saves a TON of reminiscence and works quicker. Plus, it really works on low or older {hardware} like NVIDIA’s P100 GPU!

    Right here is the Circulate Chart of the my mannequin:

    FragmentStream_Attention complete implemetation architecture in a Transformer based model

    you can even checkout this experiment on this Kaggle pocket book and GitHub repository.

    1. Balancing Reminiscence and Accuracy: Splitting the textual content into fragments with out shedding vital particulars was tough. I realized that selecting the best fragment dimension (like 128 tokens) is tremendous vital.
    2. Understanding Commerce-offs: Whereas FragmentStream Consideration saves reminiscence, it’s not good for each scenario. For actually quick texts, conventional consideration would possibly nonetheless work higher.
    3. Not likely Certain: I haven’t examined this mannequin on a really enormous dataset like i attempted it on Kaggle’s free P100 gpu on 26,075,321 coaching samples and 1,368,819 take a look at samples primarily based on a healthcare associated dataset by NIH Senior Well being and right here is the glimpse of the output:
    query = "Howdy"
    test_generation(query, temperature=0.0)

    Producing reply (temperature=0.0)...

    Outcomes:
    A: Howdy Hello , I've gone by your question and perceive your concern . You could have a historical past of allergic response to the pores and skin . It's a frequent downside of the pores and skin . It's a frequent situation of the pores and skin . It's a frequent explanation for the pores and skin . It's a frequent explanation for the pores and skin . It's a fungal an infection . It's a frequent explanation for the pores and skin an infection . It's a frequent explanation for the pores and skin . It's a fungal an infection . It's a frequent explanation for the pores and skin an infection . It's a fungal an infection . It's a frequent explanation for the pores and skin an infection . It's a fungal an infection . It's a fungal an infection . It's a frequent explanation for the pores and skin . It's a fungal an infection . It's a frequent explanation for the pores and skin . It's a fungal an infection . It's a fungal an infection . It's a frequent explanation for the pores and skin an infection . It's a fungal an infection . It's a frequent explanation for the pores and skin . It's a fungal an infection . It's a frequent

    query = "I'm having ache in my head"
    test_generation(query, temperature=0.5)

    Producing reply (temperature=0.5)...

    Outcomes:
    A: I'm having ache in my head Thanks for writing to Chat Physician . Since you've gotten historical past of ache in chest ache , I might counsel you to rule out cardiac trigger to your signs . You could have to see a physician for a analysis and therapy . Until then , you could have to take an antacid like omeprazole and antacid . When you ought to go for a chest x - ray and chest x - ray and blood take a look at to rule out any cardiac illnesses . If wanted x - ray is regular then no want to fret . Hope I've answered your query , in case you have doubt then I can be joyful to reply . Thanks for utilizing Chat Physician . Want you an excellent well being . Hello , I'm a 32 12 months previous girl . I've been experiencing ache in my left facet of my left shoulder blade and on the decrease left facet of my neck . I've had a ache in my again . I had a ache in my left arm . It has gotten worse . I had a small bruise on my again of my again .

    Chatbots: To develop easy chatbot which may reply extra neatly utilizing context to generate good responses however at very, very low price.

    Knowledgeable Techniques: To develop a website particular Knowledgeable System as an alternative of a basic function ai.

    Effectivity: To develop language mannequin as Environment friendly and quicker as potential.

    In conclusion, my analysis into FragmentStream Consideration continues to be a piece in progress. As you possibly can see from the outputs, the mannequin is presently simply predicting the following phrase, which is regular provided that the dataset used is fundamental and targeted on easy prompts. My objective is to refine it additional and make it extra environment friendly, finally creating very, very light-weight language mannequin that works like extra superior ones.

    The thought behind FragmentStream Consideration is to make transformers extra memory-efficient to allow them to run on smaller {hardware} with out shedding their potential to grasp complicated language. Whereas it’s proven promising outcomes, there’s nonetheless rather a lot to enhance, particularly when working with bigger and extra numerous datasets.

    I plan to make this undertaking open supply, I’m sharing this on GitHub and Kaggle for the neighborhood to discover, contribute to, and enhance. I really recognize any suggestions or contributions to assist me make this undertaking even higher. Thanks for studying, this was my first article additionally, Thankyou to your curiosity and assist!



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleMacron Pitches Lighter Regulation to Fuel A.I. Boom in Europe
    Next Article Six Ways to Control Style and Content in Diffusion Models
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

    July 1, 2025
    Machine Learning

    Why PDF Extraction Still Feels LikeHack

    July 1, 2025
    Machine Learning

    🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    STOP Building Useless ML Projects – What Actually Works

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    Duolingo Will Replace Contract Workers With AI, CEO Says

    April 29, 2025

    12 Must-Read Data Science Case Studies | by Aman Kardam | Jan, 2025

    January 25, 2025

    Robot Videos: Shape Shifting and Humanoids Getting Up, and More

    March 1, 2025
    Our Picks

    STOP Building Useless ML Projects – What Actually Works

    July 1, 2025

    Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

    July 1, 2025

    The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.