Close Menu
    Trending
    • Futurwise: Unlock 25% Off Futurwise Today
    • 3D Printer Breaks Kickstarter Record, Raises Over $46M
    • People are using AI to ‘sit’ with them while they trip on psychedelics
    • Reinforcement Learning in the Age of Modern AI | by @pramodchandrayan | Jul, 2025
    • How This Man Grew His Beverage Side Hustle From $1k a Month to 7 Figures
    • Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology | Jul, 2025
    • How Smart Entrepreneurs Turn Mid-Year Tax Reviews Into Long-Term Financial Wins
    • Become a Better Data Scientist with These Prompt Engineering Tips and Tricks
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Artificial Intelligence»The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics
    Artificial Intelligence

    The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics

    Team_AIBS NewsBy Team_AIBS NewsMay 9, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    -up to my earlier article: The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines. My first article targeted on how visualizations can be utilized to mislead, diving right into a type of knowledge presentation extensively utilized in public issues.

    On this article, I am going a bit deeper, how a misunderstanding of statistical concepts is breeding floor for being deceived by knowledge. Particularly, I’ll stroll via how correlation, base proportions, abstract statistics, and misinterpretation of uncertainty can lead folks astray.

    Let’s get proper into it.

    Correlation ≠ Causation

    Let’s begin with a traditional to get in the precise state of mind for some extra advanced concepts. From the earliest statistics lessons in grade college, we’re all instructed that correlation shouldn’t be equal to causation.

    For those who do a little bit of Googling or studying, yow will discover “statistics” that present a excessive correlation between cigarette consumption and common life expectancy [1]. Fascinating. Nicely, does that imply we must always all begin smoking to reside longer?

    In fact not. We’re lacking a confounding issue: shopping for cigarettes requires cash, and nations with increased wealth understandably have increased life expectations. There isn’t a causal hyperlink between cigarettes and age. I like this instance as a result of it’s so blatantly deceptive and highlights the purpose effectively. Generally, it’s essential to be cautious of any knowledge that solely reveals a correlational hyperlink.

    From a scientific standpoint, a correlation may be recognized through remark, however the one method to declare causation is to really conduct a randomized trial controlling for potential confounding elements—a reasonably concerned course of.

    I selected to start out right here as a result of whereas being introductory, this idea additionally highlights a key concept that underpins understanding knowledge successfully: The information solely reveals what it reveals, and nothing else.

    Preserve that in thoughts as we transfer ahead.

    Bear in mind Base Proportions

    In 1978, Dr. Stephen Casscells and his crew famously requested a gaggle of 60 physicians, residents, and college students at Harvard Medical Faculty the next questions:

    “If a take a look at to detect a illness whose prevalence is 1 in 1,000 has a false constructive fee of 5%, what’s the likelihood that an individual discovered to have a constructive outcome truly has the illness, assuming you already know nothing concerning the particular person’s signs or indicators?”

    Although introduced in medical phrases, this query is de facto about statistics. Accordingly, it additionally has connections to knowledge science. Take a second to consider your personal reply to this query earlier than studying additional.

    Photograph by Getty Images on Unsplash

    The reply is (roughly) 2%. Now, if you happen to regarded via this rapidly (and aren’t in control along with your statistics), you will have guessed considerably increased.

    This was definitely the case with the medical college people. Solely 11/60 folks accurately answered the query, with 27/60 going as excessive as 95% of their response (presumably simply subtracting the false constructive fee from 100).

    It’s straightforward to imagine that the precise worth ought to be excessive as a result of constructive relaxation outcome, however this assumption incorporates a vital reasoning error: It fails to account for the extraordinarily low prevalence of the illness within the inhabitants.

    Mentioned one other means, if only one in each 1,000 folks has the illness, this must be taken under consideration when calculating the likelihood of a random particular person having the illness. The likelihood doesn’t rely solely on the constructive take a look at outcome. As quickly because the take a look at accuracy falls beneath 100%, the affect of the bottom fee comes into play fairly considerably.

    Formally, this reasoning error is named the base fee fallacy.

    To see this extra clearly, think about that only one in each 1,000,000 folks had the illness, however the take a look at nonetheless has a false constructive fee of 5%. Would you continue to assume {that a} constructive take a look at outcome instantly signifies a 95% likelihood of getting the illness? What if it was 1 in a billion?

    Base charges are extraordinarily essential. Keep in mind that.

    Statistical Measures Are NOT Equal to the Information

    Let’s check out the next quantitative knowledge units (13 of them, to be exact), all of that are visualized as a scatter plot. One is even within the form of a dinosaur.

    Picture By Writer. Generated utilizing code out there beneath MIT license at https://jumpingrivers.github.io/datasauRus/

    Do you see something attention-grabbing about these knowledge units?

    I’ll level you in the precise course. Here’s a set of abstract statistics for the info:

    X-Imply 54.26
    Y-Imply 47.83
    X-SD (Customary Deviation) 16.76
    Y-SD 26.93
    Correlation -0.06

    For those who’re questioning why there is just one set of statistics, it’s as a result of they’re all the identical. Each single one of many 13 Charts above has the identical imply, customary deviation, and correlation between variables.

    This well-known set of 13 knowledge units is named the Datasaurus Dozen [5], and was revealed some years in the past as a stark instance of why abstract statistics can’t at all times be trusted. It additionally highlights the worth of visualization as a software for knowledge exploration. Within the phrases of famend statistician John Tukey,

    “The best worth of an image is when it forces us to note what we by no means anticipated to see.“

    Understanding Uncertainty

    To conclude, I need to discuss a slight variation of misleading knowledge, however one that’s equally essential: mistrusting knowledge that’s truly appropriate. In different phrases, false deception.

    The next chart is taken from a examine analyzing the feelings of headlines taken from left-leaning, right-leaning, and centrist information shops [6]:

    “Common yearly sentiment of headlines grouped by the ideological leanings of reports shops” by Authors of the examine: David Rozado, Ruth Hughes, Jamin Halberstadt is licensed beneath CC BY 4.0. To view a replica of this license, go to https://creativecommons.org/licenses/by/4.0/?ref=openverse.

    There’s fairly a bit happening within the chart above, however there may be one specific facet I need to draw your consideration to: the vertical traces extending from every plotted level. You could have seen these earlier than. Formally, these are known as error bars, and they’re a method that scientists usually depict uncertainty within the knowledge.

    Let me say that once more. In statistics and Data Science, “error” is synonymous with “uncertainty.” Crucially, it doesn’t imply one thing is unsuitable or incorrect about what’s being proven. When a chart depicts uncertainty, it depicts a rigorously calculated measure of the vary of a worth and the extent of confidence at varied factors inside that vary. Sadly, many individuals simply take it to imply that whoever made the chart is basically guessing.

    It is a severe error in reasoning, for the injury is twofold: Not solely does the info at hand get misinterpreted, however the presence of this false impression additionally contributes to the damaging societal perception that science is to not be trusted. Being upfront concerning the limitations of data ought to truly improve our confidence in a declare’s reliability, however mistaking that limitation as admission of foul play results in the alternative impact.

    Studying methods to interpret uncertainty is difficult however extremely essential. On the minimal, a superb place to start out is realizing what the so-called “error” is definitely making an attempt to convey.

    Recap and Closing Ideas

    Right here’s a cheat sheet for being cautious of misleading knowledge:

    • Correlation ≠ causation. Search for the confounding issue.
    • Bear in mind base proportions. The likelihood of a phenomenon is extremely influenced by its prevalence within the inhabitants, regardless of how correct your take a look at is (except for 100% accuracy, which is uncommon).
    • Beware abstract Statistics. Means and medians will solely take you to date; it is advisable discover your knowledge.
    • Don’t misunderstand uncertainty. It isn’t an error; it’s a rigorously thought of description of confidence ranges.

    Bear in mind these, and also you’ll be effectively positioned to sort out the subsequent knowledge science drawback that makes its method to you.

    Till subsequent time.

    References

    [1] How Charts Lie, Alberto Cairo

    [2] https://pmc.ncbi.nlm.nih.gov/articles/PMC4955674

    [3] https://data88s.org/textbook/content/Chapter_02/04_Use_and_Interpretation.html?utm_source=chatgpt.com

    [4] https://visualizing.jp/the-datasaurus-dozen

    [5] https://dl.acm.org/doi/abs/10.1145/3025453.3025912?casa_token=AU6PWgCWQuMAAAAA:5a9-oA38RxxzmVGZiIFJdrNdOMII2kmsFLJK22WJgaAk37PECCmAQjwVzAiapGiV4MAOPTJ8-uax0g

    [6] https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0276367



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleExploring the Design and Dynamics of AI Agents | by Veritas AI | May, 2025
    Next Article The Easy Way to Keep Tabs on Site Status and Downtime
    Team_AIBS News
    • Website

    Related Posts

    Artificial Intelligence

    Become a Better Data Scientist with These Prompt Engineering Tips and Tricks

    July 1, 2025
    Artificial Intelligence

    Lessons Learned After 6.5 Years Of Machine Learning

    July 1, 2025
    Artificial Intelligence

    Prescriptive Modeling Makes Causal Bets – Whether You Know it or Not!

    June 30, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Futurwise: Unlock 25% Off Futurwise Today

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    What the New IRS Rules Mean for Your Business — And How to Come Out Ahead

    April 22, 2025

    “Better way to pay attention” is what you need. | by Gowrav Vishwakarma | Jan, 2025

    January 14, 2025

    AI is coming for music, too

    April 16, 2025
    Our Picks

    Futurwise: Unlock 25% Off Futurwise Today

    July 1, 2025

    3D Printer Breaks Kickstarter Record, Raises Over $46M

    July 1, 2025

    People are using AI to ‘sit’ with them while they trip on psychedelics

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.