Close Menu
    Trending
    • Revisiting Benchmarking of Tabular Reinforcement Learning Methods
    • Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025
    • Qantas data breach to impact 6 million airline customers
    • He Went From $471K in Debt to Teaching Others How to Succeed
    • An Introduction to Remote Model Context Protocol Servers
    • Blazing-Fast ML Model Serving with FastAPI + Redis (Boost 10x Speed!) | by Sarayavalasaravikiran | AI Simplified in Plain English | Jul, 2025
    • AI Knowledge Bases vs. Traditional Support: Who Wins in 2025?
    • Why Your Finance Team Needs an AI Strategy, Now
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Understanding Confusion Matrix and Evaluation Metrics in Classification Problems | by Debisree Ray | Apr, 2025
    Machine Learning

    Understanding Confusion Matrix and Evaluation Metrics in Classification Problems | by Debisree Ray | Apr, 2025

    Team_AIBS NewsBy Team_AIBS NewsApril 29, 2025No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Unlocking the Secrets and techniques of Your Classifier’s Success and Failures

    When engaged on a classification drawback, one of the vital essential elements after coaching the mannequin is mannequin analysis. Choosing the proper analysis metric can considerably affect the way you interpret your mannequin’s efficiency and the way you enhance it. On the coronary heart of most analysis strategies lies the confusion matrix.

    On this weblog, we’ll discover:

    What’s a confusion matrix?

    Key metrics derived from it: Precision, Recall, F1 Rating, Accuracy

    Kind I and Kind II Errors

    When to concentrate on which metric

    ROC Curve and AUC Rating

    Instance Python code

    A confusion matrix is a desk that’s usually used to explain the efficiency of a classification mannequin on a set of check knowledge for which the precise values are identified.

    For a binary classification drawback:

    Predicted PositivePredicted NegativeActual PositiveTrue Optimistic (TP)False Unfavorable (FN)Precise NegativeFalse Optimistic (FP)True Unfavorable (TN)

    • True Optimistic (TP): The mannequin predicted a constructive consequence, and the precise label was additionally constructive. (Instance: The mannequin appropriately identifies a affected person with a illness.)
    • True Unfavorable (TN): The mannequin predicted a damaging consequence, and the precise label was additionally damaging. (Instance: The mannequin appropriately identifies a wholesome affected person.)
    • False Optimistic (FP): The mannequin predicted a constructive consequence, however the precise label was damaging. (Instance: The mannequin wrongly says a wholesome affected person has a illness.)
    • False Unfavorable (FN): The mannequin predicted a damaging consequence, however the precise label was constructive. (Instance: The mannequin wrongly says a sick affected person is wholesome.)

    From this matrix, a number of necessary analysis metrics could be calculated.

    • Definition: (TP + TN) / (TP + FP + FN + TN)
    • Interpretation: How usually the classifier is right.

    Drawback: Accuracy could be deceptive if the dataset is imbalanced.

    • Definition: TP / (TP + FP)
    • Interpretation: When the mannequin predicts a constructive class, how usually is it right?

    Actual-world instance: In spam detection, excessive precision signifies that emails categorised as spam are certainly spam. Low precision would trigger many necessary, respectable emails to be wrongly filtered as spam (false positives).

    Excessive precision means a low false constructive price.

    • Definition: TP / (TP + FN)
    • Interpretation: Out of all precise positives, what number of had been appropriately recognized?

    Actual-world instance: In most cancers detection, excessive recall is essential as a result of we wish to catch as many precise most cancers circumstances as doable. Lacking a constructive case (false damaging) could be harmful.

    Excessive recall means a low false damaging price.

    • Definition: 2 * (Precision * Recall) / (Precision + Recall)
    • Interpretation: Harmonic imply of precision and recall.

    F1 is a steadiness between precision and recall, useful when you’ve got imbalanced lessons.

    • Kind I Error (False Optimistic): Incorrectly figuring out a damaging occasion as constructive.
    • Instance: Flagging a respectable e-mail as spam.
    • Kind II Error (False Unfavorable): Incorrectly figuring out a constructive occasion as damaging.
    • Instance: Failing to detect a fraudulent transaction.

    Understanding the price of these errors is essential. In some circumstances, reminiscent of fraud detection or illness screening, Kind II errors are much more expensive than Kind I errors.

    The Receiver Working Attribute (ROC) curve plots the True Optimistic Price (also called Recall) in opposition to the False Optimistic Price (1— Specificity) at numerous threshold settings.

    • The nearer the ROC curve is to the top-left nook, the higher the mannequin.
    • The Space Below the Curve (AUC) quantifies the mannequin’s general means to discriminate between constructive and damaging lessons.

    Interpretation:

    • AUC = 1: Good mannequin
    • AUC = 0.5: No discrimination (random guess)

    Let’s generate and consider a classification mannequin utilizing Scikit-learn.

    import numpy as np
    import pandas as pd
    from sklearn.datasets import make_classification
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, roc_curve
    import matplotlib.pyplot as plt

    # Generate dummy dataset
    X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

    # Cut up dataset
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

    # Prepare a mannequin
    clf = RandomForestClassifier()
    clf.match(X_train, y_train)

    # Predict
    y_pred = clf.predict(X_test)

    # Metrics
    cm = confusion_matrix(y_test, y_pred)
    print("Confusion Matrix:n", cm)
    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("Precision:", precision_score(y_test, y_pred))
    print("Recall:", recall_score(y_test, y_pred))
    print("F1 Rating:", f1_score(y_test, y_pred))

    # ROC Curve
    y_probs = clf.predict_proba(X_test)[:, 1] # Possibilities for constructive class
    fpr, tpr, thresholds = roc_curve(y_test, y_probs)
    roc_auc = roc_auc_score(y_test, y_probs)

    plt.determine()
    plt.plot(fpr, tpr, shade='darkorange', lw=2, label=f'ROC curve (space = {roc_auc:.2f})')
    plt.plot([0, 1], [0, 1], shade='navy', lw=2, linestyle='--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Optimistic Price')
    plt.ylabel('True Optimistic Price')
    plt.title('Receiver Working Attribute')
    plt.legend(loc="decrease proper")
    plt.present()

    The confusion matrix is a robust device that gives deeper perception into not solely how usually your classifier is right, however additionally how correct it’s, in addition to how inaccurate it’s.

    Choosing the proper metric is closely context-dependent (is dependent upon the use case):

    Subsequent time you’re employed on a classification drawback, transcend accuracy. Look into confusion matrices, take into consideration your use case, and choose the analysis metric accordingly!

    Joyful Modeling! 🚀



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleFrom Bullet Train to Balance Beam: Welcome to the Intelligence Age
    Next Article Is AI “normal”? | MIT Technology Review
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025

    July 2, 2025
    Machine Learning

    Blazing-Fast ML Model Serving with FastAPI + Redis (Boost 10x Speed!) | by Sarayavalasaravikiran | AI Simplified in Plain English | Jul, 2025

    July 2, 2025
    Machine Learning

    From Training to Drift Monitoring: End-to-End Fraud Detection in Python | by Aakash Chavan Ravindranath, Ph.D | Jul, 2025

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Revisiting Benchmarking of Tabular Reinforcement Learning Methods

    July 2, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    mnbv

    April 4, 2025

    How Word-of-Mouth Alone Can Double Your Revenue Growth

    April 23, 2025

    Bertrand Piccard’s Hydrogen Fuel-Cell Aircraft

    May 25, 2025
    Our Picks

    Revisiting Benchmarking of Tabular Reinforcement Learning Methods

    July 2, 2025

    Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025

    July 2, 2025

    Qantas data breach to impact 6 million airline customers

    July 2, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.