Close Menu
    Trending
    • How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1
    • From Training to Drift Monitoring: End-to-End Fraud Detection in Python | by Aakash Chavan Ravindranath, Ph.D | Jul, 2025
    • Using Graph Databases to Model Patient Journeys and Clinical Relationships
    • Cuba’s Energy Crisis: A Systemic Breakdown
    • AI Startup TML From Ex-OpenAI Exec Mira Murati Pays $500,000
    • STOP Building Useless ML Projects – What Actually Works
    • Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025
    • The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»#1minPapers MSFT’s rStar-Math small language model self-improves and generates own training data | by Gwen Cheni | Jan, 2025
    Machine Learning

    #1minPapers MSFT’s rStar-Math small language model self-improves and generates own training data | by Gwen Cheni | Jan, 2025

    Team_AIBS NewsBy Team_AIBS NewsJanuary 12, 2025No Comments2 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    That is the second time in latest months {that a} small mannequin carried out equally properly (or higher) than the billion-parameter massive fashions. Granted math issues are distinctive: largely quantifiable and verifiable.

    “In contrast to options counting on superior LLMs for information synthesis, rStar-Math leverages smaller language fashions (SLMs) with Monte Carlo Tree Search (MCTS) to determine a self-evolutionary course of, iteratively producing higher-quality coaching information.”

    End result: “4 rounds of self-evolution with tens of millions of synthesized options for 747k math issues … it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%.”

    Course of reward modeling (PRM) supplies fine-grained suggestions on intermediate steps as a result of incorrect intermediate steps considerably lower information high quality in math.

    SLM samples candidate nodes, every producing a one-step CoT and the corresponding Python code. Solely nodes with profitable Python code execution are retained, mitigating errors in intermediate steps. MCTS routinely assign (self-annotate) a Q-value to every intermediate step primarily based on its contribution: steps contributing to extra trajectories that result in the proper reply are given larger Q-values and thought of larger high quality.

    SLM as a course of choice mannequin (PPM) to foretell reward labels for every math reasoning step. Though Q-values aren’t exact, they will reliably distinguish constructive (right) steps from unfavorable (irrelevant/incorrect) ones. Utilizing choice pairs and pairwise rating loss, as an alternative of instantly utilizing Q-values as reward labels, eradicate the inherently noise and imprecision in stepwise reward task.

    Paper on arXiv: https://arxiv.org/abs/2501.04519



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWybot S3 Pool Cleaning Robot Announced at CES 2025
    Next Article Mastering Transparent Images: Adding a Background Layer | by Florian Trautweiler | Jan, 2025
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    From Training to Drift Monitoring: End-to-End Fraud Detection in Python | by Aakash Chavan Ravindranath, Ph.D | Jul, 2025

    July 1, 2025
    Machine Learning

    Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025

    July 1, 2025
    Machine Learning

    Why PDF Extraction Still Feels LikeHack

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    Kernel Case Study: Flash Attention

    April 4, 2025

    Inside Amsterdam’s high-stakes experiment to create fair welfare AI

    June 11, 2025

    The startup trying to turn the web into a database

    December 11, 2024
    Our Picks

    How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

    July 1, 2025

    From Training to Drift Monitoring: End-to-End Fraud Detection in Python | by Aakash Chavan Ravindranath, Ph.D | Jul, 2025

    July 1, 2025

    Using Graph Databases to Model Patient Journeys and Clinical Relationships

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.