Close Menu
    Trending
    • Using Graph Databases to Model Patient Journeys and Clinical Relationships
    • Cuba’s Energy Crisis: A Systemic Breakdown
    • AI Startup TML From Ex-OpenAI Exec Mira Murati Pays $500,000
    • STOP Building Useless ML Projects – What Actually Works
    • Credit Risk Scoring for BNPL Customers at Bati Bank | by Sumeya sirmula | Jul, 2025
    • The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z
    • Musk’s X appoints ‘king of virality’ in bid to boost growth
    • Why Entrepreneurs Should Stop Obsessing Over Growth
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Artificial Intelligence»Partial Dependence Plots: How to Discover Variables Influencing a Model | by Mythili Krishnan
    Artificial Intelligence

    Partial Dependence Plots: How to Discover Variables Influencing a Model | by Mythili Krishnan

    Team_AIBS NewsBy Team_AIBS NewsJanuary 1, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    We’ll now use the code beneath to coach the random forest mannequin.

    # Prepare the RF mannequin

    rf_model = RandomForestClassifier(n_estimators=100, random_state=1).match(train_x,train_y)

    pred_y = rf_model.predict(test_x)

    cm = confusion_matrix(test_y,pred_y)
    print(cm)
    accuracy_score(test_y,pred_y)

    The output of the Random forest mannequin is given beneath:

    The random forest mannequin has a barely higher accuracy at ~50% with (13+12) targets recognized appropriately and (14+11) targets mis-classified-14 being false positives and 11 being false negatives.

    We’ll now have a look at essentially the most influential variables in each the fashions and the way they’re affecting the accuracy. We’ll use ‘PermutationImportance’ from the ‘eli5’ library for this goal. We will do that with just one line of code as given beneath:

    # Import PermutationImportance from the eli5 library

    from eli5.sklearn import PermutationImportance

    # Influential variables for Choice Tree mannequin

    eli5.show_weights(perm, feature_names = test_x.columns.tolist())

    The influential variables within the determination tree mannequin is :

    Probably the most influential variables within the determination tree mannequin is ‘1st Purpose’, ‘Distance coated’, ‘Yellow Card’ amongst others. There are additionally variables that affect the accuracy negatively like ‘Ball possession %’ and ‘Move accuracy %’. Some variables like ‘Purple’ Card, ‘Purpose scored’ and many others has no affect on the accuracy of the mannequin.

    The influential variables within the random forest mannequin is :

    Probably the most influential variables within the determination tree mannequin is ‘Ball possession %’, ‘Free Kicks’, ‘Yellow Card’ and ‘Personal Objectives’ amongst others. There are additionally variables that affect the accuracy negatively like ‘Purple Card’ and ‘Offsides’ — therefore we will drop these variables from the mannequin to extend the accuracy.

    The weights point out by how a lot share the mannequin accuracy is impacted by the variable when the variables are re-shuffled. For eg: Through the use of the characteristic ‘Ball possession %’ the mannequin accuracy might be improved by 5.20% in a variety of (+-) 5.99%.

    As you may observe there are vital variations within the variables that affect the two fashions and for a similar variable like say ‘Yellow Card’ the share of change in accuracy additionally differs.

    Allow us to now take one variable say ‘Yellow Card’ that’s influencing each the fashions and attempt to discover out the edge at which the accuracy will increase. We will do that simply with Partial dependence plots (PDP).

    A partial dependence (PD) plot depicts the practical relationship between enter variables and predictions. It exhibits how the predictions partially rely on values of the enter variables.

    For instance: We will create a partial dependence plot of the variable ‘Yellow Card’ to grasp how modifications within the values of the variable ‘Yellow Card’ impacts total accuracy of the mannequin.

    We’ll begin with the choice tree mannequin first –

    # Import the libraries

    from matplotlib import pyplot as plt
    from pdpbox import pdp, info_plots

    # Choose the variable/characteristic to plot

    feature_to_plot = 'Yellow Card'
    features_input = test_x.columns.tolist()
    print(features_input)

    # PDP plot for Choice tree mannequin

    pdp_yl = pdp.PDPIsolate(mannequin=dt_model,df=test_x,
    model_features=features_input,
    characteristic=feature_to_plot, feature_name=feature_to_plot)

    fig, axes = pdp_yl.plot(middle=True, plot_lines=False, plot_pts_dist=True,
    to_bins=False, engine='matplotlib')
    fig.set_figheight(6)# Import the libraries

    from matplotlib import pyplot as plt
    from pdpbox import pdp, get_dataset, info_plots

    # Choose the variable/characteristic to plot

    feature_to_plot = 'Distance Lined (Kms)'

    # PDP plot for Choice tree mannequin

    pdp_dist = pdp.pdp_isolate(mannequin=dt_model,dataset=test_x,
    model_features=feature_names,
    characteristic= feature_to_plot)

    pdp.pdp_plot(pdp_dist, feature_to_plot)
    plt.present()

    The plot will seem like this:

    PDP Plot for Choice Tree mannequin (Picture by Creator)

    If variety of yellow playing cards is greater than 3 that may negatively affect the ‘Man of the Match’, but when yellow playing cards is < 3 then that doesn’t affect the mannequin. Additionally, after 5 yellow playing cards, there isn’t a vital impact on the mannequin.

    The PDP (Partial dependence plot) helps to offer an perception into the edge values of the options that affect the mannequin.

    Now we will use the identical code for the random forest mannequin and have a look at the plot :

    PDP Plot for Random Forest mannequin (Picture by Creator)

    For each the choice tree mannequin and the random forest mannequin, the plot seems related with the efficiency of the mannequin altering win the vary if 3 to five; put up which the variable ‘yellow card’ has little or no affect on the mannequin as given by the flat line henceforth.

    That is how we will use easy PDP plots to grasp the behaviour of influential variables within the mannequin. This data can’t solely draw insights in regards to the variables that affect the mannequin however is very useful in coaching the fashions and for collection of the appropriate options. The thresholds also can assist to create bins that can be utilized to sub-set the options that may additional improve the accuracy of the mannequin. In flip, this helps to make the mannequin outcomes explainable to the enterprise.

    Please consult with this link on Github for the the dataset and the total code.

    I might be reached on Medium, LinkedIn or Twitter in case of any questions/feedback.

    You may comply with me subscribe to my electronic mail checklist 📩 here, so that you just don’t miss out on my newest articles.

    References:

    [1] Abraham Itzhak Weinberg, Selecting a representative decision tree from an ensemble of decision-tree models for fast big data classification (Feb 2019), Springer

    [2] Leo Breiman, Random Forests (Oct 2001), Springer

    [3] Alex Goldstein, Adam Kapelner, Justin Bleich, and Emil Pitkin, Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual
    Conditional Expectation
    (Mar 2004), The Wharton College of the College of Pennsylvania, arxiv.org





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleHow to build Conscious AGI. I argue consciousness may emerge in… | by Logan Hallee | Jan, 2025
    Next Article Is AI Coming For Your Job? New Data Shows How AI Has Impact.
    Team_AIBS News
    • Website

    Related Posts

    Artificial Intelligence

    STOP Building Useless ML Projects – What Actually Works

    July 1, 2025
    Artificial Intelligence

    Implementing IBCS rules in Power BI

    July 1, 2025
    Artificial Intelligence

    Become a Better Data Scientist with These Prompt Engineering Tips and Tricks

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Using Graph Databases to Model Patient Journeys and Clinical Relationships

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    Denetimsiz Öğrenmede Boyut İndirgeme Algoritmaları | by X | Apr, 2025

    April 27, 2025

    The New Career Crisis: AI Is Breaking the Entry-Level Path for Gen Z

    July 1, 2025

    Ethical Considerations of AI in Investing

    February 24, 2025
    Our Picks

    Using Graph Databases to Model Patient Journeys and Clinical Relationships

    July 1, 2025

    Cuba’s Energy Crisis: A Systemic Breakdown

    July 1, 2025

    AI Startup TML From Ex-OpenAI Exec Mira Murati Pays $500,000

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.