Close Menu
    Trending
    • 3D Printer Breaks Kickstarter Record, Raises Over $46M
    • People are using AI to ‘sit’ with them while they trip on psychedelics
    • Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025
    • How This Man Grew His Beverage Side Hustle From $1k a Month to 7 Figures
    • Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology | Jul, 2025
    • How Smart Entrepreneurs Turn Mid-Year Tax Reviews Into Long-Term Financial Wins
    • Become a Better Data Scientist with These Prompt Engineering Tips and Tricks
    • Meanwhile in Europe: How We Learned to Stop Worrying and Love the AI Angst | by Andreas Maier | Jul, 2025
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Winsorization: A Simple and Effective Way to Handle Outliers in Your Data | by Sugavasilakshmisahithi | Feb, 2025
    Machine Learning

    Winsorization: A Simple and Effective Way to Handle Outliers in Your Data | by Sugavasilakshmisahithi | Feb, 2025

    Team_AIBS NewsBy Team_AIBS NewsFebruary 23, 2025No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Winsorization with a before-and-after comparability of outliers

    Winsorization is likely one of the easiest and best methods to deal with outliers in a dataset. Nonetheless, many individuals are unaware of this methodology or misunderstand the way it works. On this weblog, I’ll clarify what Winsorization is, when to make use of it, and why it’s thought-about a simple strategy. Let’s dive in!

    Winsorization is a statistical approach used to handle outliers in a dataset. Opposite to what some would possibly suppose, Winsorization doesn’t take away outliers. As an alternative, it replaces the acute values (outliers) with the closest values inside a specified vary. This course of helps to scale back the influence of outliers with out utterly discarding them.

    Let’s think about a situation the place you might be working with battery information that consists of voltage, present, and time. The voltage values ought to ideally vary between [1.94, 2.5], however as a result of some points (e.g., sensor errors or anomalies), the voltage often spikes to excessive values like [8, 10]. These excessive values are outliers and might negatively influence your mannequin’s capability to make correct predictions.

    To handle this, you should use Winsorization to interchange these excessive values with much less excessive ones, lowering their influence on the dataset and enhancing your mannequin’s efficiency.

    Right here’s how one can apply Winsorization to deal with the acute voltage values:

    import numpy as np
    from scipy.stats.mstats import winsorize

    # Instance battery information: voltage, present, and time
    voltage = np.array([1.94, 2.0, 2.1, 2.2, 2.3, 2.5, 8.0, 9.5, 10.0, 2.4, 2.1, 1.95])
    present = np.array([1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1])
    time = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

    # Outline the appropriate voltage vary
    voltage_range = [1.94, 2.5]

    # Determine excessive values
    extreme_values = (voltage < volttage_range[0]) | (voltage > volttage_range[1])
    print("Excessive Voltage Values:", voltage[extreme_values])

    # Apply Winsorization to interchange excessive values
    # Right here, we Winsorize 10% of the info (5% from the decrease finish and 5% from the higher finish)
    winsorized_voltage = winsorize(voltage, limits=[0.05, 0.05])

    # Print outcomes
    print("Unique Voltage:", voltage)
    print("Winsorized Voltage:", winsorized_voltage)

    1. Information Preparation:
    • The voltage array comprises some excessive values ([8.0, 9.5, 10.0]) that fall exterior the appropriate vary [1.94, 2.5].
    • The present and time arrays are included for context however usually are not affected by Winsorization.

    2. Determine Excessive Values:

    • We outline the appropriate vary for voltage ([1.94, 2.5]) and establish values exterior this vary as excessive.

    3. Apply Winsorization:

    • The winsorize perform from scipy.stats.mstats is used to interchange the acute values. On this instance, we Winsorize 10% of the info (5% from the decrease finish and 5% from the higher finish).
    • The perform replaces the acute values with the closest values throughout the specified percentiles.

    4. Outcomes:

    • The unique voltage array comprises excessive values ([8.0, 9.5, 10.0]).
    • After Winsorization, these excessive values are changed with much less excessive values, lowering their influence on the dataset.
    Excessive Voltage Values: [ 8.   9.5 10. ]
    Unique Voltage: [ 1.94 2. 2.1 2.2 2.3 2.5 8. 9.5 10. 2.4 2.1 1.95]
    Winsorized Voltage: [1.94 2. 2.1 2.2 2.3 2.5 2.5 2.5 2.5 2.4 2.1 1.95]

    Winsorization is especially helpful in conditions the place:

    1. Outliers are current however shouldn’t be eliminated: In some instances, outliers include beneficial info, and eradicating them might result in lack of necessary insights. Winsorization permits you to retain the info whereas minimizing its influence.
    2. Information normalization is required: If it’s essential normalize information for statistical evaluation or machine studying fashions, Winsorization will help by lowering the skewness attributable to outliers.
    3. Strong statistical measures are wanted: Winsorization could make statistical measures just like the imply and normal deviation extra sturdy to excessive values, offering a greater illustration of the central tendency and variability of the info.

    Winsorization is taken into account easy as a result of:

    1. Straightforward to Implement: The method entails figuring out the percentiles and changing the acute values, which may be accomplished with fundamental statistical features in most programming languages (e.g., Python, R).
    2. No Information Loss: In contrast to different strategies that take away outliers, Winsorization retains all information factors, guaranteeing that no info is misplaced.
    3. Interpretable Outcomes: The outcomes of Winsorization are straightforward to interpret, as the info retains its unique construction, however with diminished affect from excessive values.

    Winsorization is a strong but easy approach to deal with outliers in datasets. By changing excessive values with the closest acceptable values, it reduces their influence whereas preserving the general integrity of the dataset. This makes it a really perfect selection when coping with outliers that shouldn’t be eliminated however must be managed for higher evaluation or modeling.

    With its straightforward implementation and no information loss, Winsorization is an efficient and accessible software for each newbie and skilled information scientists. Give it a strive in your subsequent undertaking and see the way it improves your outcomes!



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleChina Rescues Stranded Lunar Satellites After Rocket Failure
    Next Article This is the Mindset Shift That Separates Winners from Everyone Else
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025

    July 1, 2025
    Machine Learning

    Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology | Jul, 2025

    July 1, 2025
    Machine Learning

    Meanwhile in Europe: How We Learned to Stop Worrying and Love the AI Angst | by Andreas Maier | Jul, 2025

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    3D Printer Breaks Kickstarter Record, Raises Over $46M

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    Android Baby, Nao Robocup, and More

    January 4, 2025

    The Future Isn’t Waiting-So Why Are You?

    February 18, 2025

    A Comprehensive Guide to LLM Temperature 🔥🌡️

    February 8, 2025
    Our Picks

    3D Printer Breaks Kickstarter Record, Raises Over $46M

    July 1, 2025

    People are using AI to ‘sit’ with them while they trip on psychedelics

    July 1, 2025

    Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.