Close Menu
    Trending
    • How generative AI could help make construction sites safer
    • PCA and SVD: The Dynamic Duo of Dimensionality Reduction | by Arushi Gupta | Jul, 2025
    • 5 Ways Artificial Intelligence Can Support SMB Growth at a Time of Economic Uncertainty in Industries
    • Microsoft Says Its AI Diagnoses Patients Better Than Doctors
    • From Reporting to Reasoning: How AI Is Rewriting the Rules of Data App Development
    • Can AI Replace Doctors? How Technology Is Shaping Healthcare – Healthcare Info
    • Singapore police can now seize bank accounts to stop scams
    • How One Founder Is Rethinking Supplements With David Beckham
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»#1minPapers MSFT’s rStar-Math small language model self-improves and generates own training data | by Gwen Cheni | Jan, 2025
    Machine Learning

    #1minPapers MSFT’s rStar-Math small language model self-improves and generates own training data | by Gwen Cheni | Jan, 2025

    Team_AIBS NewsBy Team_AIBS NewsJanuary 12, 2025No Comments2 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    That is the second time in latest months {that a} small mannequin carried out equally properly (or higher) than the billion-parameter massive fashions. Granted math issues are distinctive: largely quantifiable and verifiable.

    “In contrast to options counting on superior LLMs for information synthesis, rStar-Math leverages smaller language fashions (SLMs) with Monte Carlo Tree Search (MCTS) to determine a self-evolutionary course of, iteratively producing higher-quality coaching information.”

    End result: “4 rounds of self-evolution with tens of millions of synthesized options for 747k math issues … it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%.”

    Course of reward modeling (PRM) supplies fine-grained suggestions on intermediate steps as a result of incorrect intermediate steps considerably lower information high quality in math.

    SLM samples candidate nodes, every producing a one-step CoT and the corresponding Python code. Solely nodes with profitable Python code execution are retained, mitigating errors in intermediate steps. MCTS routinely assign (self-annotate) a Q-value to every intermediate step primarily based on its contribution: steps contributing to extra trajectories that result in the proper reply are given larger Q-values and thought of larger high quality.

    SLM as a course of choice mannequin (PPM) to foretell reward labels for every math reasoning step. Though Q-values aren’t exact, they will reliably distinguish constructive (right) steps from unfavorable (irrelevant/incorrect) ones. Utilizing choice pairs and pairwise rating loss, as an alternative of instantly utilizing Q-values as reward labels, eradicate the inherently noise and imprecision in stepwise reward task.

    Paper on arXiv: https://arxiv.org/abs/2501.04519



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWybot S3 Pool Cleaning Robot Announced at CES 2025
    Next Article Mastering Transparent Images: Adding a Background Layer | by Florian Trautweiler | Jan, 2025
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    PCA and SVD: The Dynamic Duo of Dimensionality Reduction | by Arushi Gupta | Jul, 2025

    July 2, 2025
    Machine Learning

    Can AI Replace Doctors? How Technology Is Shaping Healthcare – Healthcare Info

    July 2, 2025
    Machine Learning

    Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025

    July 2, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    How generative AI could help make construction sites safer

    July 2, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    Top 5 Free Data Science Books to Learn Fast

    June 6, 2025

    Shut vile death video site, families say, as Ofcom gets new powers

    March 17, 2025

    5 Essential Tips to Build Business Dashboards Stakeholders Love | by Yu Dong | Dec, 2024

    December 11, 2024
    Our Picks

    How generative AI could help make construction sites safer

    July 2, 2025

    PCA and SVD: The Dynamic Duo of Dimensionality Reduction | by Arushi Gupta | Jul, 2025

    July 2, 2025

    5 Ways Artificial Intelligence Can Support SMB Growth at a Time of Economic Uncertainty in Industries

    July 2, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.