Close Menu
    Trending
    • Why PDF Extraction Still Feels LikeHack
    • GenAI Will Fuel People’s Jobs, Not Replace Them. Here’s Why
    • Millions of websites to get ‘game-changing’ AI bot blocker
    • I Worked Through Labor, My Wedding and Burnout — For What?
    • Cloudflare will now block AI bots from crawling its clients’ websites by default
    • 🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025
    • Futurwise: Unlock 25% Off Futurwise Today
    • 3D Printer Breaks Kickstarter Record, Raises Over $46M
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Why Scaling Data Matters in Machine Learning (Without the Jargon) | by Ndhilani Simbine | Feb, 2025
    Machine Learning

    Why Scaling Data Matters in Machine Learning (Without the Jargon) | by Ndhilani Simbine | Feb, 2025

    Team_AIBS NewsBy Team_AIBS NewsFebruary 17, 2025No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Have you ever ever tried to match apples to oranges? Not metaphorically, however actually — one may weigh 200 grams, whereas the opposite weighs 150 grams. Once I began working with machine studying, I rapidly realized that this similar subject occurs with knowledge. If I had been constructing a mannequin to foretell fruit costs, the distinction in weight would make apples appear far more essential than oranges. That is precisely why I want characteristic scaling in machine studying!

    On this article, I’ll break down why scaling is crucial, the way it helps fashions like logistic regression, and the way I apply it — with out diving too deep into the technical weeds.

    Once I constructed a fraud detection system, I seen some main discrepancies in my dataset:

    • Transaction Quantity: Ranges from $1 to $100,000.
    • Fee Kind: Both 0 or 1.
    • Steadiness Distinction: Could possibly be between -$10,000 and $50,000.

    In uncooked kind, the transaction quantity (which may be as excessive as $100,000) overpowered different options like fee sort (which is simply 0 or 1). This made my mannequin favor giant numbers and ignore smaller ones, resulting in poor predictions.

    To repair this, I apply characteristic scaling, which transforms all numbers into a typical vary. One standard technique is standardization, which adjusts every quantity in order that:

    • The common worth is 0
    • The unfold of values (variance) is 1

    This permits all options — massive or small — to contribute equally to the mannequin’s decision-making course of.

    AmountPayment TypeBalance Diff5000120001001502500015000

    Discover how “Quantity” is way bigger than the opposite values? That’s an issue!

    AmountPayment TypeBalance Diff0.20.0–0.1–1.30.0–1.21.50.01.3

    Now, all of the values are in a related vary, guaranteeing a fairer comparability between options.

    Scaling improves my mannequin in three key methods:

    With out scaling, giant numbers (like transaction quantities) dominate smaller ones (like fee sort). Scaling ensures all knowledge factors contribute equally.

    Algorithms like logistic regression, neural networks, and assist vector machines carry out higher with scaled knowledge as a result of they optimize sooner and keep away from numerical instability.

    My fashions typically study sooner and converge extra rapidly when working with standardized knowledge, saving me priceless time and computing energy.

    In the event you’re working with Python and wish to scale your dataset, you should utilize the StandardScaler from the sklearn library:

    from sklearn.preprocessing import StandardScaler
    from sklearn.model_selection import train_test_split
    # Instance dataset
    options = [[5000, 1, 2000], [100, 1, 50], [25000, 1, 5000]]
    # Cut up into coaching and testing units
    X_train, X_test = train_test_split(options, test_size=0.3)
    # Create a scaler and match it to the coaching knowledge
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.remodel(X_test) # Use the identical scaler on check knowledge

    By utilizing .fit_transform() on the coaching knowledge and .remodel() on the check knowledge, I make sure the mannequin sees constant scaled values all through the training course of. That is essential as a result of becoming solely on the coaching knowledge prevents knowledge leakage, the place info from the check set influences the mannequin’s studying course of, resulting in overly optimistic efficiency estimates.

    Function scaling may sound technical, however it’s a easy but highly effective step that may dramatically enhance machine studying fashions. In the event you’re working with datasets the place numbers have vastly totally different ranges, scaling may be the distinction between an correct mannequin and a deceptive one.

    Subsequent time you construct a mannequin, ask your self: Are my options enjoying honest? If not, it’s time to scale up! 🚀



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleNews Bytes Podcast 20250217: Arm Selling Its Own Chips to Meta?, Big xAI, Big Power, Big… Pollution?, TSMC in Intel Fab Takeover?, Europe’s Big AI Investment
    Next Article Why (and How) Corporations Should Hire Entrepreneurs
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Why PDF Extraction Still Feels LikeHack

    July 1, 2025
    Machine Learning

    🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025

    July 1, 2025
    Machine Learning

    Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Why PDF Extraction Still Feels LikeHack

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    OpenAI can rehabilitate AI models that develop a “bad boy persona”

    June 18, 2025

    Apple Worldwide Developers Conference Day 1: WWDC Highlights

    June 9, 2025

    Ghost Job Listings on the Rise, How to Spot, Avoid: Experts

    January 13, 2025
    Our Picks

    Why PDF Extraction Still Feels LikeHack

    July 1, 2025

    GenAI Will Fuel People’s Jobs, Not Replace Them. Here’s Why

    July 1, 2025

    Millions of websites to get ‘game-changing’ AI bot blocker

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.