Close Menu
    Trending
    • Implementing IBCS rules in Power BI
    • What comes next for AI copyright lawsuits?
    • Why PDF Extraction Still Feels LikeHack
    • GenAI Will Fuel People’s Jobs, Not Replace Them. Here’s Why
    • Millions of websites to get ‘game-changing’ AI bot blocker
    • I Worked Through Labor, My Wedding and Burnout — For What?
    • Cloudflare will now block AI bots from crawling its clients’ websites by default
    • 🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Paper Explained 316: NuminaMath. The NuminaMath dataset is a… | by Ritvik Rastogi | Feb, 2025
    Machine Learning

    Paper Explained 316: NuminaMath. The NuminaMath dataset is a… | by Ritvik Rastogi | Feb, 2025

    Team_AIBS NewsBy Team_AIBS NewsFebruary 24, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    The NuminaMath dataset is a complete assortment of 860k pairs of competitors math issues and options. Issues vary from high-school-level to advanced-competition-level, all meticulously annotated with accompanying chain-of-thought traces. This dataset is designed to boost the mathematical reasoning capabilities of LLMs and stands as the most important math dataset ever launched within the area.

    The undertaking is on the market at GitHub.

    Datasets and pattern sizes.

    The information sources embody Chinese language highschool math workout routines, US and worldwide arithmetic olympiad issues, and issues collected from on-line boards.

    • MATH and GSM8K: Present reference options are reformatted right into a Chain-of-Thought (CoT) format utilizing GPT-4, following suggestions from DeepseekMath and ReAlign.
    • Orca-Math: Common expressions are used to extract and simplify solutions from the unique dataset’s answer textual content. Solutions are then enclosed inside boxed{} for constant formatting. The authors word that if a well-formatted model of Orca-Math is already current within the coaching knowledge, this step is perhaps redundant. An alternate is utilizing GPT-4 to generate the ultimate answer.
    • AMC and AIME: Issues and LaTeX-formatted options are collected from the Artwork of Downside Fixing (AoPS) wiki. The primary answer containing a boxed{} image is chosen. Resulting from overlap with the MATH dataset, a decontamination course of utilizing embeddings is employed, leading to roughly 4,300 issues retained for coaching. Remaining options are then realigned into the CoT format utilizing GPT-4.
    • AoPS Discussion board: Issues are crawled from the AoPS Contest Assortment web page. Since options aren’t explicitly offered, replies with boxed{} symbols are thought of, prioritizing these with probably the most LaTeX. The chosen reply is handled because the reference answer and rewritten in CoT format by GPT-4.
    • Chinese language Ok-12 Examination: Ok-12 math workout routines are collected from public examination papers, typically sourced from public assets. OCR and regex segmentation are used to extract problem-solution pairs from PDFs. GPT-4 is then used for translation and realignment of options into the CoT format.
    • Artificial Information: Artificial issues are generated utilizing the MATH and AMC-AIME coaching cut up datasets, Xwin Math. In contrast to the unique technique, the answer from the preliminary technology stage (utilizing GPT-4 with a temperature of 0.8) is straight used to scale back prices.
    • World Olympiads Information: 152K problem-solution pairs are collected from varied sources:
    • Worldwide contests and their shortlists (e.g., IMO, APMO, BMO).
    • Nationwide and regional contests (see Determine 2 within the unique textual content for nation breakdown).
    • Downside-solving boards, puzzle and olympiad books, and summer time college supplies.
    • PDFs are the first supply format; HTML content material is transformed to PDF. A pipeline (described elsewhere within the unique textual content) is then utilized to course of these issues.
    Variety of samples per knowledge supply.

    Decontamination

    The next two-step decontamination technique is used:

    • All 10-gram actual matches are faraway from all datasets, besides the artificial dataset and the MATH practice set.
    • To raised decontaminate, Mistral embeddings are computed for every of the issues apart from MATH practice and the artificial datasets. All issues with an embedding distance < 0.15 are then eliminated. This worth is derived empirically, above which contamination will not be noticed in inner checks.

    After creating the NuminaMath-CoT dataset, extending to TIR (instrument built-in reasoning) is easy. The identical strategy as ToRA, significantly their immediate, is adopted to pattern TIR knowledge from GPT-4o. The method to create this TIR dataset is as follows:

    1. Extract a subset of roughly 100K issues with worth output from the NuminaMath-CoT dataset.
    2. Pattern an answer utilizing the GPT-4o assistant API for every downside with a temperature of 0.8.
    3. Filter the unfavourable samples the place mannequin generated solutions don’t match the reference reply. For integer output issues actual match is used. For different expressions, a match is set utilizing GPT-4o as a choose.
    4. Repeat the identical course of on the unfavourable issues.

    Fashions are fine-tuned utilizing a two-stage course of impressed by the MuMath-Code paper.

    1. High quality-tuning on a big, various dataset of pure language math issues and options, with CoT annotations to facilitate reasoning.
    2. High quality-tuning on an artificial dataset of TIR, the place issues are decomposed into rationales, Python packages, and outputs.

    Fashions are skilled at two scales:

    • 7B, based mostly on DeepSeekMath-Base 7B
    • 72B, based mostly on Qwen2–72B
    Hyper-parameters used within the experiments.

    Software-integrated reasoning (TIR)

    Every run of TIR begins with an issue, x. The objective of TIR is to pattern a candidate answer, y. TIR begins by initializing a context, c, with an preliminary immediate c0 containing solely x. This context is then prolonged by means of as much as ok rounds of interplay.

    On every iteration, i, TIR makes use of a sampler, S, and an LLM, θ, to pattern textual content containing CoT and Python supply code, zi, till reaching the cease key phrase wstop = “`output. After sampling zi, TIR first checks if a candidate answer has been generated, which might be wrapped within the key phrase wanswer = boxed{}.

    If a solution is current, TIR applies a response parser, R, to the output, which acts to sanitize the textual content and return solely the ultimate numerical response with any items and different formatting eliminated. If no legitimate response is current, TIR assesses whether or not any code has been generated by matching a daily expression with the python area key phrase wpython = “`python(.*)“`.

    If no such area is on the market, zi is discarded, and TIR proceeds to the following iteration, resampling a recent block of textual content. If such a area is on the market, the Python supply code is handed to the Python interpreter, I, which parses and executes the supply code. The outcome, ri from operating I(zi) could embody the output of print statements, or a truncated Traceback if an exception was raised.

    The operating context is then prolonged, continuing to the following spherical of interplay, through setting ci to ci−1 ⊕ zi ⊕ ri, the place ⊕ denotes concatenation. Thus, by the top of interplay, c = c0z1r1z2r2 . . . z≤ok, the place both a candidate reply, y, is efficiently extracted from z≤ok else an error key phrase werror.

    Within the case of SC-TIR, n samples are generated from TIR, then a filter, F is utilized, which removes ill-formed responses and eventually self-consistency majority voting is utilized.

    Varied 7B, 8B, and 70B parameter language fashions are in contrast on benchmarks, together with GSM8k (grade college math), MATH (math downside fixing), AMC 2023 (competition-level math), and AIME 2024 (competition-level math).

    Comparability of varied 7B and 8B parameter language fashions on completely different math benchmarks.
    • NuminaMath with TIR achieves state-of-the-art (SoTA) efficiency amongst 7B and 8B parameter fashions.
    Comparability of varied open weight and proprietary language fashions on completely different math benchmarks.
    • Fashions with TIR display vital enhancements in problem-solving capabilities, particularly in advanced reasoning duties.
    • NuminaMath with TIR additionally performs competitively in opposition to bigger fashions (70B parameters) like Claude 3.5 and GPT-4o, outperforming them on some benchmarks and approaching GPT-4o’s efficiency on others.

    NuminaMath-1.5 is the second iteration of the NuminaMath dataset, designed to offer high-quality post-training knowledge for competition-level math issues. It comprises roughly 900k issues with Chain of Thought (CoT) options.

    The dataset is on the market at HuggingFace.

    Downside Metadata: Contains reply, problem_type, and question_type metadata for all issues to make sure verifiable output.

    • reply: Remaining reply or particular values like “proof” or “notfound”.
    • problem_type: Mathematical area (Algebra, Geometry, Quantity Principle, and so forth.).
    • question_type: Downside fashion (multiple-choice, proof, math phrase downside).

    New Information:

    • Olympiads Reference: Manually parsed and verified issues and options from official web sites of nationwide Math Olympiads.
    • Manually Curated Information: Competitors issues in cn_contest, inequalities, and number_theory.
    • Eliminated Information: Artificial dataset synthetic_amc is eliminated attributable to efficiency points.

    NuminaMath: The largest public dataset in AI4Maths with 860k pairs of competition math problems and solutions



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSAG Awards 2025: How and when to watch Hollywood’s most heartfelt tribute to actors
    Next Article The best apps to find new books
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Why PDF Extraction Still Feels LikeHack

    July 1, 2025
    Machine Learning

    🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025

    July 1, 2025
    Machine Learning

    Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Implementing IBCS rules in Power BI

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    Amazon faces strike threat in US ahead of Christmas

    December 18, 2024

    Screen time in bed linked to worse sleep, study finds

    April 1, 2025

    The first trial of generative AI therapy shows it might help with depression

    March 28, 2025
    Our Picks

    Implementing IBCS rules in Power BI

    July 1, 2025

    What comes next for AI copyright lawsuits?

    July 1, 2025

    Why PDF Extraction Still Feels LikeHack

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.