Close Menu
    Trending
    • Candy AI NSFW AI Video Generator: My Unfiltered Thoughts
    • Anaconda : l’outil indispensable pour apprendre la data science sereinement | by Wisdom Koudama | Aug, 2025
    • Automating Visual Content: How to Make Image Creation Effortless with APIs
    • A Founder’s Guide to Building a Real AI Strategy
    • Starting Your First AI Stock Trading Bot
    • Peering into the Heart of AI. Artificial intelligence (AI) is no… | by Artificial Intelligence Details | Aug, 2025
    • E1 CEO Rodi Basso on Innovating the New Powerboat Racing Series
    • When Models Stop Listening: How Feature Collapse Quietly Erodes Machine Learning Systems
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Artificial Intelligence»How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes
    Artificial Intelligence

    How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes

    Team_AIBS NewsBy Team_AIBS NewsJuly 16, 2025No Comments9 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Overview

    simply optical illusions or mind-bending puzzles. They can be logical, inflicting preliminary observations to collapse upon nearer investigation. In information science, paradoxes come up once we take numbers at face worth, with out trying into the context behind them. One can have the sharpest visuals and nonetheless stroll away with the improper story. 

    On this article, we focus on three logical paradoxes that function cautionary tales for anybody who interprets information too rapidly, with out making use of context. We discover how paradoxes come up in Information Science & Enterprise Intelligence (BI) use circumstances after which lengthen the insights to Retrieval-Augmented Technology (RAG) programs, the place comparable paradoxes can undermine the standard of each the immediate offered and the mannequin’s output.

    Simpson’s Paradox in Enterprise Intelligence

    Simpson’s paradox describes the situation the place traits reverse when information is aggregated. In different phrases, the traits that you just observe in subgroups get flipped while you mix the numbers and analyze them. Let’s assume that we’re analyzing the gross sales of 4 places of a preferred Ice cream chain. When the gross sales for every location are individually analyzed, it means that the chocolate taste is probably the most most well-liked amongst clients. However when the gross sales are added up, the development goes away, and the brand new mixed outcomes counsel that vanilla is most well-liked probably the most. This development reversal is denoted by Simpson’s Paradox. We use the fictional information under to exhibit this.

    Location Chocolate Vanilla Complete Prospects Chocolate % Vanilla % Winner
    Suburb A 15 5 20 75.0% 25.0% Chocolate
    Metropolis B 33 27 60 55.0% 45.0% Chocolate
    Mall 2080 1920 4000 52.0% 48.0% Chocolate
    Airport 1440 2160 3600 40.0% 60.0% Vanilla
    Complete 3568 4112 7680 46.5% 53.5% Vanilla!
    Gross sales by Retailer Location for a Fictitious Ice Cream Chain (By the Writer)

    Under is a visible illustration.

    Simpson’s Paradox in BI Reporting – Illustration (Picture by the Writer)

    An information analyst who overlooks these subgroup dynamics might assume that chocolate is underperforming. Therefore, it’s important to mixture numbers by subgroups and verify for the presence of Simpson’s paradox. When a reversal in development happens, the lurking variable ought to be recognized as the following step. A lurking variable is the hidden issue influencing group outcomes. On this case, the shop location occurs to be the lurking variable. A deep contextual understanding is required to interpret why the sale of vanilla icecreams was excessive on the airport, flipping the general consequence. Some questions that may very well be used to analyze are:

    • Do airport retailers inventory fewer chocolate choices?

    • Do vacationers desire milder flavors?

    • Was there a promotional marketing campaign favoring Vanilla at shops within the airport?

    Simpson’s Paradox in RAG Methods

    Let’s suppose that you’ve an RAG (Retrieval-Augmented Technology) mannequin that gauges public sentiment in direction of electrical autos (EVs) and solutions questions across the identical. The mannequin makes use of information articles from 2010 to 2024. Till 2016, EVs have been receiving combined opinions on account of their restricted vary, greater shopping for value, and lack of charging stations. All these components made driving in EVs for lengthy distances unimaginable. Newspaper stories earlier than 2017 used to focus on such deficiencies. However as of 2017, EVs began being perceived in an excellent gentle on account of enhancements in efficiency and the supply of charging stations. This shift in notion occurred significantly after the profitable launch of Tesla’s premium EV. An RAG mannequin that makes use of information stories from 2010 to 2024 would likely give contradictory responses to comparable questions, which is able to set off the Simpson’s Paradox. 

    For example, if the RAG is requested, “Is EV adoption within the US nonetheless low?”, the reply could be “Sure, adoption stays low on account of excessive shopping for prices and restricted infrastructure”. If the RAG is requested, “Has EV adoption elevated not too long ago within the U.S.?”, the reply can be ‘Sure, adoption has elevated tremendously on account of developments in know-how and charging infrastructure’. On this case, the lurking variable is the publication date. A sensible repair to this subject is to tag paperwork (articles) into time-based bins in the course of the pre-processing part. Different choices embody encouraging the customers to specify a time vary of their immediate (e.g. Within the final 5 years, how has the adoption of EV been?) or fine-tuning the LLM to explicitly state the timeline that it’s contemplating for its response (e.g., Round 2024, EV Adoption has elevated tremendously.).

    Simpson’s Paradox in RAG Methods (Picture by the Writer)

    Accuracy Paradox in Information Science Issues

    The crux of the Accuracy Paradox is that prime accuracy isn’t indicative of a helpful output. Let’s assume that you’re constructing a classification mannequin to establish whether or not a affected person has a uncommon illness that impacts just one in 100. The mannequin appropriately identifies and labels those that would not have the illness and thereby achieves a 99% accuracy. Nevertheless, it fails to establish the one one who has the illness and desires pressing medical consideration. Thereby, the mannequin turns into ineffective for detecting the illness, which is its very function. This happens particularly in imbalanced datasets the place the observations for one class are minimal. This has been illustrated within the determine under. 

    Accuracy Paradox in Information Science Issues (Picture by the Writer)

    One of the best ways to sort out the Accuracy paradox is to make use of metrics that seize the efficiency of the minority lessons, comparable to Precision, Recall, and F1-score. One other strategy to observe is to deal with imbalanced datasets as anomaly detection issues, as towards classification issues. One may additionally take into account accumulating extra minority class information (if doable), over-sampling the minority class, or undersampling the bulk class. Under is a fast information that helps decide which metric to make use of relying on the use case, goal, and penalties of errors.

     Selecting the Proper Metric in your Mannequin’s Efficiency Measurement (Picture by the Writer)

    Accuracy Paradox in LLMs

    Whereas the Accuracy Paradox is a typical subject that many information scientists sort out, its implications in LLMs are largely ignored. The Accuracy metric can dangerously overpromise in use circumstances that contain security, toxicity detection, and bias mitigation. A excessive accuracy doesn’t imply {that a} mannequin is honest and secure to make use of. For instance, an LLM mannequin that has a 98% accuracy is of no use if it misclassifies 2 malicious prompts as being secure and innocent. Therefore, in LLM evaluations, it’s a good suggestion to make use of recall, precision, or PR-AUC over Accuracy, as they point out how nicely the mannequin tackles minority lessons.

    Goodhart’s Legislation in Enterprise Intelligence

    Economist Charles Goodhart said that “When a measure turns into a goal, it ceases to be an excellent measure.” This legislation is a delicate reminder that should you over-optimize a metric with out understanding the implications and context, the mannequin will backfire. 

    A supervisor of a fictitious on-line information company units a KPI for his crew. He asks the crew to work in direction of growing the session length by 20%. The crew extends introductions artificially and in addition provides filler content material to extend the session length. The session length goes up, however the video high quality is misplaced, and in consequence, the worth that customers get from the video will get diminished.

    One other instance is expounded to Buyer Churn. In an try to scale back buyer churn, a subscription-based Leisure app decides to put the ‘Unsubscribe’ button in a hard-to-find location in its net portal. In consequence, the shopper churn reduces, but it surely’s not on account of improved buyer satisfaction. It’s solely due to restricted exit choices — an phantasm of buyer retention. Under is a visible illustration that demonstrates how efforts to fulfill or exceed progress targets (comparable to growing session length or person engagement) can typically result in unintended penalties, resulting in a decline in person expertise. When groups resort to synthetic inflation techniques to assist drive up efficiency metrics, the metric enchancment appears to be like good on paper, however they don’t seem to be significant in any method.

    Goodhart’s Legislation – Illustration (Picture by the Writer)

    Goodhart’s Legislation in LLMs

    Whenever you practice an LLM an excessive amount of on a specific dataset (particularly a benchmark), it might probably begin memorizing patterns from that coaching information as a substitute of studying to generalize. This can be a basic instance of overfitting, the place the mannequin performs extraordinarily nicely on that coaching information however performs poorly on real-world inputs. 

    Let’s assume that you’re coaching an LLM to summarize information articles. You utilize the ROUGE (Recall-Oriented Understudy for Gisting Analysis) metric to guage the LLM’s efficiency. The ROUGE metric rewards precise or near-exact matches of n-grams with the reference summaries. Over time, the LLM begins copying massive phrases of textual content from the enter articles with a purpose to get an elevated ROUGE rating. It additionally makes use of buzzwords that seem loads in reference summaries. Let’s assume that the enter article has the textual content “The financial institution elevated rates of interest to curb inflation, and this brought on inventory costs to say no sharply.” The overfit mannequin would summarize it as “The financial institution elevated rates of interest to curb inflation”, whereas a generalizing mannequin would summarize it as “The rate of interest hike triggered a decline within the inventory markets”. The illustration under demonstrates how optimizing your mannequin an excessive amount of for an analysis metric can lead to low-quality responses (responses which might be good on paper however usually are not useful).

    Goodhart’s Legislation in LLMs (Picture by the Writer)

    Concluding Remarks

    Whether or not it’s in enterprise intelligence or LLMs, paradoxes can creep in if numbers and metrics are dealt with with out the underlying nuance and context. Additionally, you will need to keep in mind that over-fitting can injury the larger image. Combining quantitative evaluation with human perception is essential to keep away from such pitfalls and create dependable stories and highly effective LLMs that actually ship worth.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleAI4ALL Day 6: Multimodal Learning and Bias Mitigation | by Anandita Mukherjee | Jul, 2025
    Next Article Why Skipping This One PR Move Could Stall Your Startup’s Growth Before It Even Begins
    Team_AIBS News
    • Website

    Related Posts

    Artificial Intelligence

    Candy AI NSFW AI Video Generator: My Unfiltered Thoughts

    August 2, 2025
    Artificial Intelligence

    Starting Your First AI Stock Trading Bot

    August 2, 2025
    Artificial Intelligence

    When Models Stop Listening: How Feature Collapse Quietly Erodes Machine Learning Systems

    August 2, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Candy AI NSFW AI Video Generator: My Unfiltered Thoughts

    August 2, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    AI Governance in South Africa: New Privacy Laws Every Tech Leader Must Know | by emmanuel.tshikhudo | Jul, 2025

    July 2, 2025

    Can Data Science Solve the Bermuda Triangle Mystery? 🌊 By Akash Devulapally | by Akashdvp | Mar, 2025

    March 19, 2025

    How to reclaim hours by cutting out the biggest time sucks of your workday

    February 17, 2025
    Our Picks

    Candy AI NSFW AI Video Generator: My Unfiltered Thoughts

    August 2, 2025

    Anaconda : l’outil indispensable pour apprendre la data science sereinement | by Wisdom Koudama | Aug, 2025

    August 2, 2025

    Automating Visual Content: How to Make Image Creation Effortless with APIs

    August 2, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.