all been in that second, proper? Watching a chart as if it’s some historical script, questioning how we’re purported to make sense of all of it. That’s precisely how I felt once I was requested to elucidate the AUC for the ROC curve at work lately.
Although I had a stable understanding of the mathematics behind it, breaking it down into easy, digestible phrases proved to be a problem. I noticed that if I used to be battling it, others most likely have been too. So, I made a decision to write down this text to share an intuitive approach to perceive the AUC-ROC curve by means of a sensible instance. No dry definitions right here—simply clear, easy explanations targeted on the instinct.
Right here’s the code1 used on this article.
Each information scientist goes by means of a part of evaluating classification fashions. Amidst an array of analysis metrics, Receiver Working Attribute (ROC) curve and the Area Under The Curve (AUC) is an indispensable software for gauging mannequin’s efficiency. On this complete article, we are going to focus on fundamental ideas and see them in motion utilizing our good previous Titanic dataset2.
Part 1: ROC Curve
At its core, the ROC curve visually portrays the fragile stability between a mannequin’s sensitivity and specificity throughout various classification thresholds.
To completely grasp the ROC curve, let’s delve into the ideas:
- Sensitivity/Recall (True Optimistic Price): Sensitivity quantifies a mannequin’s adeptness at accurately figuring out constructive situations. In our Titanic instance, sensitivity corresponds to the the proportion of precise survival circumstances that the mannequin precisely labels as constructive.
- Specificity (True Unfavorable Price): Specificity measures a mannequin’s proficiency in accurately figuring out destructive situations. For our dataset, it represents the proportion of precise non-survived circumstances (Survival = 0) that the mannequin accurately identifies as destructive.

- False Optimistic Price: FPR measures the proportion of destructive situations which can be incorrectly labeled as constructive by the mannequin.

Discover that Specificity and FPR are complementary to one another. Whereas specificity focuses on the proper classification of destructive situations, FPR focuses on the inaccurate classification of destructive situations as constructive. Thus-

Now that we all know the definitions, let’s work with an instance. For Titanic dataset, I’ve constructed a easy logistic regression mannequin that predicts whether or not the passenger survived the shipwreck or not, utilizing following options: Passenger Class, Intercourse, # of siblings/spouses aboard, passenger fare and Port of Embarkation. Observe that, the mannequin predicts the ‘likelihood of survival’. The default threshold for logistic regression in sklearn is 0.5. Nonetheless, this default threshold could not at all times make sense for the issue being solved and we have to mess around with the likelihood threshold i.e. if the expected likelihood > threshold, occasion is predicted to be constructive else destructive.
Now, let’s revisit the definitions of Sensitivity, Specificity and FPR above. Since our predicted binary classification depends on the likelihood threshold, for the given mannequin, these three metrics will change primarily based on the likelihood threshold we use. If we use a better likelihood threshold, we are going to classify fewer circumstances as positives i.e. our true positives might be fewer, leading to decrease Sensitivity/Recall. A better likelihood threshold additionally means fewer false positives, so low FPR. As such, rising sensitivity/recall may result in elevated FPR.
For our coaching information, we are going to use 10 completely different likelihood cutoffs and calculate Sensitivity/TPR and FPR and plot in a chart under. Observe, the dimensions of circles within the scatterplot correspond to the likelihood threshold used for classification.

Properly, that’s it. The graph we created above plots Sensitivity (TPR) Vs. FPR at varied likelihood thresholds IS the ROC curve!
In our experiment, we used 10 completely different likelihood cutoffs with an increment of 0.1 giving us 10 observations. If we use a smaller increment for the likelihood threshold, we are going to find yourself with extra information factors and the graph will appear to be our acquainted ROC curve.
To verify our understanding, for the mannequin we constructed for predicting passenger’s survival, we are going to loop by means of varied predicted likelihood thresholds and calculate TPR, FPR for the testing dataset (see code snippet under). Plot the ends in a graph and examine this graph with the ROC curve plotted utilizing sklearn’s roc_curve
3 .

As we are able to see, the 2 curves are virtually equivalent. Observe the AUC=0.92 was calculated utilizing the roc_auc_score
4 operate. We’ll focus on this AUC within the later a part of this text.
To summarize, ROC curve plots TPR and FPR for the mannequin at varied likelihood thresholds. Observe that, the precise possibilities are NOT displayed within the graph, however one can assume that the observations on the decrease left facet of the curve correspond to larger likelihood thresholds (low TPR), and remark on the highest proper facet correspond to decrease likelihood thresholds (excessive TPR).
To visualise what’s said above, check with the under chart, the place I’ve tried to annotate TPR and FPR at completely different likelihood cutoffs.

Part 2: AUC
Now that we have now developed some instinct round what ROC curve is, the following step is to grasp Space Beneath the Curve (AUC). However earlier than delving into the specifics, let’s take into consideration what an ideal classifier appears to be like like. Within the supreme case, we wish the mannequin to attain good separation between constructive and destructive observations. In different phrases, the mannequin assigns low possibilities to destructive observations and excessive possibilities to constructive observations with no overlap. Thus, there’ll exist some likelihood lower off, such that each one observations with predicted likelihood < lower off are destructive, and all observations with likelihood >= lower off are constructive. When this occurs, True Optimistic Price might be 1 and False Optimistic Price might be 0. So the perfect state to attain is TPR=1 and FPR=0. In actuality, this doesn’t occur, and a extra sensible expectation must be to maximise TPR and decrease FPR.
Basically, as TPR will increase with decreasing likelihood threshold, the FPR additionally will increase (see chart 1). We would like TPR to be a lot larger than FPR. That is characterised by the ROC curve that’s bent in direction of the highest left facet. The next ROC area chart reveals the right classifier with a blue circle (TPR=1 and FPR=0). Fashions that yield the ROC curve nearer to the blue circle are higher. Intuitively, it implies that the mannequin is ready to pretty separate destructive and constructive observations. Among the many ROC curves within the following chart, gentle blue is greatest adopted by inexperienced and orange. The dashed diagonal line represents random guesses (consider a coin flip).

Now that we perceive ROC curves skewed to the highest left are higher, how will we quantify this? Properly, mathematically, this may be quantified by calculating the Space Beneath the Curve. The Space Beneath the Curve (AUC) of the ROC curve is at all times between 0 and 1 as a result of our ROC area is bounded between 0 and 1 on each axes. Among the many above ROC curves, the mannequin equivalent to the sunshine blue ROC curve is healthier in comparison with inexperienced and orange because it has larger AUC.
However how is AUC calculated? Computationally, AUC includes integrating the Roc curve. For fashions producing discrete predictions, AUC could be approximated utilizing the trapezoidal rule6. In its easiest type, the trapezoidal rule works by approximating the area underneath the graph as a trapezoid and calculating its space. I’ll most likely focus on this in one other article.
This brings us to the final and essentially the most awaited half — how you can intuitively make sense of AUC? Let’s say you constructed a primary model of a classification mannequin with AUC 0.7 and also you later high-quality tune the mannequin. The revised mannequin has an AUC of 0.9. We perceive that the mannequin with larger AUC is healthier. However what does it actually imply? What does it indicate about our improved prediction energy? Why does it matter? Properly, there’s a variety of literature explaining AUC and its interpretation. A few of them are too technical, some incomplete, and a few are outright incorrect! One interpretation that made essentially the most sense to me is:
AUC is the likelihood {that a} randomly chosen constructive occasion possesses a better predicted likelihood than a randomly chosen destructive occasion.
Let’s confirm this interpretation. For the easy logistic regression we constructed, we are going to visualize the expected possibilities of constructive and destructive courses (i.e. Survived the shipwreck or not).

We will see the mannequin performs fairly properly in assigning a better likelihood to Survived circumstances than those who didn’t. There’s some overlap of possibilities within the center part. The AUC calculated utilizing the auc rating
operate in sklearn for our mannequin on the take a look at dataset is 0.92 (see chart 2). So primarily based on the above interpretation of AUC, if we randomly select a constructive occasion and a destructive occasion, the likelihood that the constructive occasion may have a better predicted likelihood than the destructive occasion must be ~92%.
For this objective, we are going to create swimming pools of predicted possibilities of constructive and destructive outcomes. Now we randomly choose one remark every from each the swimming pools and examine their predicted possibilities. We repeat this 100K instances. Later we calculate % of instances the expected likelihood of a constructive occasion was > predicted likelihood of a destructive occasion. If our interpretation is appropriate, this must be equal to .

We did certainly get 0.92! Hope this helps.
Let me know your feedback and be happy to attach with me on LinkedIn.
Observe — this text is revised model of the original article that I wrote on Medium in 2023.
References:
- https://github.com/Swpnilsp/ROC-AUC-Curve/blob/main/RoC_Curve_Analysis%20(2).ipynb
- https://www.kaggle.com/competitions/titanic/data (License-CC0: Public Domain)
- https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve
- https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html
- https://en.wikipedia.org/wiki/Receiver_operating_characteristic
- https://en.wikipedia.org/wiki/Trapezoidal_rule