Close Menu
    Trending
    • People are using AI to ‘sit’ with them while they trip on psychedelics
    • Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025
    • How This Man Grew His Beverage Side Hustle From $1k a Month to 7 Figures
    • Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology | Jul, 2025
    • How Smart Entrepreneurs Turn Mid-Year Tax Reviews Into Long-Term Financial Wins
    • Become a Better Data Scientist with These Prompt Engineering Tips and Tricks
    • Meanwhile in Europe: How We Learned to Stop Worrying and Love the AI Angst | by Andreas Maier | Jul, 2025
    • Transform Complexity into Opportunity with Digital Engineering
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Understanding DeepSeek-R1 paper: Beginner’s guide | by Mehul Gupta | Data Science in your pocket | Jan, 2025
    Machine Learning

    Understanding DeepSeek-R1 paper: Beginner’s guide | by Mehul Gupta | Data Science in your pocket | Jan, 2025

    Team_AIBS NewsBy Team_AIBS NewsJanuary 31, 2025No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Massive Language Fashions (LLMs) have been bettering quickly, making them nearer to Synthetic Common Intelligence (AGI) — the form of AI that may assume and motive like people.

    One of many largest enhancements in recent times is post-training — a step achieved after the preliminary mannequin coaching. This helps LLMs:

    Suppose higher (bettering reasoning abilities).

    Align with human values (decreasing dangerous outputs).

    Personalize responses based mostly on consumer preferences.

    Do all this with out utilizing as a lot computing energy as coaching from scratch.

    A breakthrough got here with OpenAI’s o1 fashions, which prolonged the reasoning course of at inference time (when the mannequin is producing responses). This implies the mannequin takes extra time to assume earlier than answering, which considerably improves its efficiency on duties like Maths, Coding, Scientific reasoning

    Nonetheless, scaling this reasoning skill successfully throughout real-time use (test-time scaling) continues to be an open problem.

    Researchers have tried completely different strategies to reinforce reasoning, together with:

    Reward fashions (evaluating how good a response is).

    Reinforcement studying (RL) (educating the mannequin by way of trial and error).

    Search algorithms (Monte Carlo Tree Search, Beam Search, and so on.).

    Thus far, none of those strategies have matched OpenAI’s o1 fashions in reasoning.

    What This Paper Introduces

    The paper explores a brand new means to enhance reasoning utilizing pure reinforcement studying (RL) — which means no supervised knowledge (human-labeled examples). As a substitute, the mannequin learns by itself by way of an RL framework known as GRPO (we’ll focus on this in some depth).

    Utilizing DeepSeek-V3-Base as the muse, they educated a mannequin known as DeepSeek-R1-Zero. Over 1000’s of RL steps, the mannequin:

    Developed highly effective reasoning abilities.

    Improved AIME 2024 benchmark rating from 15.6% → 71.0% (and even 86.7% with majority voting)

    Matched the reasoning skill of OpenAI-o1–0912.

    Nonetheless, DeepSeek-R1-Zero had some issues:

    Poor readability.

    Language mixing (struggled with retaining responses constant).

    To repair these points, they launched DeepSeek-R1, which mixes:

    Chilly-start fine-tuning (coaching with a small quantity of labeled knowledge).

    Reinforcement studying targeted on reasoning.

    Supervised fine-tuning (SFT) utilizing high-quality human-labeled knowledge.

    After these steps, DeepSeek-R1 matched OpenAI-o1–1217 in reasoning.

    Last Contribution: Mannequin Distillation

    Additionally they distilled DeepSeek-R1 into smaller fashions (like Qwen2.5–32B), proving that:

    Bigger fashions be taught higher reasoning patterns.

    Smaller fashions can inherit this data without having advanced RL coaching.

    Their 14B distilled mannequin even outperformed the perfect open-source fashions, setting new benchmarks in reasoning for dense fashions.

    Therefore,

    DeepSeek launched 2 principal fashions, DeepSeek-R1 and DeepSeek-R1-Zero

    Additionally they launched some distilled variations of DeepSeek, extra for deployment functions

    The key discovery is utilizing Reinforcement Studying instantly for bettering reasoning.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSoftBank in Talks to Invest Up to $25 Billion in OpenAI
    Next Article Navigating Data Science Content: Recognizing Common Pitfalls, Part 1 | by Geremie Yeo | Jan, 2025
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025

    July 1, 2025
    Machine Learning

    Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology | Jul, 2025

    July 1, 2025
    Machine Learning

    Meanwhile in Europe: How We Learned to Stop Worrying and Love the AI Angst | by Andreas Maier | Jul, 2025

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    People are using AI to ‘sit’ with them while they trip on psychedelics

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    OpenAI’s new defense contract completes its military pivot

    December 10, 2024

    Japanese-Chinese Translation with GenAI: What Works and What Doesn’t

    March 27, 2025

    Trump’s new meme-coin sparks anger in crypto world

    January 26, 2025
    Our Picks

    People are using AI to ‘sit’ with them while they trip on psychedelics

    July 1, 2025

    Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025

    July 1, 2025

    How This Man Grew His Beverage Side Hustle From $1k a Month to 7 Figures

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.