Close Menu
    Trending
    • What If I Had AI in 2020: Rent The Runway Dynamic Pricing Model
    • Questioning Assumptions & (Inoculum) Potential | by Jake Winiski | Aug, 2025
    • FFT: The 60-Year Old Algorithm Underlying Today’s Tech
    • Highest-Paying Jobs For Older Adults: New Report
    • BofA’s Quiet AI Revolution—$13 Billion Tech Plan Aims to Make Banking Smarter, Not Flashier
    • Unveiling LLM Secrets: Visualizing What Models Learn | by Suijth Somanunnithan | Aug, 2025
    • Definite Raises $10M for AI-Native Data Stack
    • Mark Rober becomes the latest YouTube star to secure Netflix deal
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Artificial Intelligence»Your Classifier Is Broken, But It Is Still Useful | by David Lindelöf | Jan, 2025
    Artificial Intelligence

    Your Classifier Is Broken, But It Is Still Useful | by David Lindelöf | Jan, 2025

    Team_AIBS NewsBy Team_AIBS NewsJanuary 12, 2025No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Towards Data Science

    Once you run a binary classifier over a inhabitants you get an estimate of the proportion of true positives in that inhabitants. This is called the prevalence.

    Picture by Rod Long on Unsplash

    However that estimate is biased, as a result of no classifier is ideal. For instance, in case your classifier tells you that you’ve got 20% of optimistic circumstances, however its precision is thought to be solely 50%, you’ll count on the true prevalence to be 0.2 × 0.5 = 0.1, i.e. 10%. However that’s assuming good recall (all true positives are flagged by the classifier). If the recall is lower than 1, then you recognize the classifier missed some true positives, so that you additionally have to normalize the prevalence estimate by the recall.

    This results in the frequent system for getting the true prevalence Pr(y=1) from the optimistic prediction charge Pr(ŷ=1):

    However suppose that you just need to run the classifier greater than as soon as. For instance, you may need to do that at common intervals to detect traits within the prevalence. You may’t use this system anymore, as a result of precision depends upon the prevalence. To make use of the system above you would need to re-estimate the precision recurrently (say, with human eval), however then you could just as well also re-estimate the prevalence itself.

    How can we get out of this round reasoning? It seems that binary classifiers produce other efficiency metrics (moreover precision) that don’t depend upon the prevalence. These embody not solely the recall R but additionally the specificity S, and these metrics can be utilized to regulate Pr(ŷ=1) to get an unbiased estimate of the true prevalence utilizing this system (typically known as prevalence adjustment):

    the place:

    • Pr(y=1) is the true prevalence
    • S is the specificity
    • R is the sensitivity or recall
    • Pr(ŷ=1) is the proportion of positives

    The proof is easy:

    Fixing for Pr(y = 1) yields the system above.

    Discover that this system breaks down when the denominator R — (1 — S) turns into 0, or when recall turns into equal to the false optimistic charge 1-S. However keep in mind what a typical ROC curve appears to be like like:

    From https://en.wikipedia.org/wiki/Receiver_operating_characteristic#/media/File:Roccurves.png

    An ROC curve like this one plots recall R (aka true optimistic charge) towards the false optimistic charge 1-S, so a classifier for which R = (1-S) is a classifier falling on the diagonal of the ROC diagram. It is a classifier that’s, primarily, guessing randomly. True circumstances and false circumstances are equally more likely to be labeled positively by this classifier, so the classifier is totally non-informative, and you may’t be taught something from it—and positively not the true prevalence.

    Sufficient principle, let’s see if this works in follow:

    # randomly draw some covariate
    x <- runif(10000, -1, 1)

    # take the logit and draw the end result
    logit <- plogis(x)
    y <- runif(10000) < logit

    # match a logistic regression mannequin
    m <- glm(y ~ x, household = binomial)

    # make some predictions, utilizing an absurdly low threshold
    y_hat <- predict(m, sort = "response") < 0.3

    # get the recall (aka sensitivity) and specificity
    c <- caret::confusionMatrix(issue(y_hat), issue(y), optimistic = "TRUE")
    recall <- unname(c$byClass['Sensitivity'])
    specificity <- unname(c$byClass['Specificity'])

    # get the adjusted prevalence
    (imply(y_hat) - (1 - specificity)) / (recall - (1 - specificity))

    # evaluate with precise prevalence
    imply(y)

    On this simulation I get recall = 0.049 and specificity = 0.875. The anticipated prevalence is a ridiculously biased 0.087, however the adjusted prevalence is actually equal to the true prevalence (0.498).

    To sum up: this exhibits how, utilizing a classifier’s recall and specificity, you’ll be able to adjusted the anticipated prevalence to trace it over time, assuming that recall and specificity are steady over time. You can not do that utilizing precision and recall as a result of precision depends upon the prevalence, whereas recall and specificity don’t.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleLearnings from a Machine Learning Engineer — Part 4: The Model | by David Martin | Jan, 2025
    Next Article Automate Applications and Supercharge Your Job Hunt for $39
    Team_AIBS News
    • Website

    Related Posts

    Artificial Intelligence

    What If I Had AI in 2020: Rent The Runway Dynamic Pricing Model

    August 22, 2025
    Artificial Intelligence

    BofA’s Quiet AI Revolution—$13 Billion Tech Plan Aims to Make Banking Smarter, Not Flashier

    August 21, 2025
    Artificial Intelligence

    Why AI Text Humanizers Are a Game Changer for Content Writers

    August 21, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    What If I Had AI in 2020: Rent The Runway Dynamic Pricing Model

    August 22, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    My employee is bad at his job but keeps saying he’s doing great

    August 10, 2025

    Meet the Solar Engineer Transforming Lebanon’s Power Grid

    March 30, 2025

    The Real Machine Learning Loop: From Problem to Production (And Back Again) | by Julieta D. Rubis | May, 2025

    May 25, 2025
    Our Picks

    What If I Had AI in 2020: Rent The Runway Dynamic Pricing Model

    August 22, 2025

    Questioning Assumptions & (Inoculum) Potential | by Jake Winiski | Aug, 2025

    August 22, 2025

    FFT: The 60-Year Old Algorithm Underlying Today’s Tech

    August 21, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.