Receiver Operating Characteristic (ROC) Curve

December 26, 2020

Receiver Operating Characteristic (ROC) curve is used for the analysis of results from a classification model.

A classification model outputs probabilities. A cutoff decides whether to classify a probability as a positive or a negative output. By default, this is assumed to be 0.5. So, a probability below 0.5 belongs to negative class and greater equal-to 0.5 belongs to positive class. But this is an assumption. An ROC curve helps us play with this assumption to best suit the problem at hand i.e. whether we want to decrease or increase the number of positives from the prediction.

This cutoff is tuned between 0 and 1 to plot two metrics in an ROC curve, namely - True Positive Rate (TPR) and False Positive Rate (FPR). Only TPR and FPR values are plotted on the graph. Corresponding probability cutoffs are present separately in a table.

Now coming to the concept of TPR and FPR -

For all predicted positives, they can be either true (right) or false (wrong).
For all predicted negatives, they can be either true (right) or false (wrong).
In ROC curve, we are only concerned about predicted positives.
TPR = (True Positives)/(Actual Positives). Ideally 1.
FPR= (False Positives)/(Actual Negatives). Ideally 0.

In an ROC curve -

when cutoff is 0, then TPR = 1 and FPR = 1. The graph starts from top right corner (1,1).
when cutoff is 1, then TPR = 0 and FPR = 0. The graph ends at bottom left corner (0,0).

There is a concept of threshold cutoff in an ROC curve which helps us choose a point on the curve wherein the rate of increase of TPR is greatest compared to the rate of increase of FPR. This is an inflection point on the ROC curve. Theoretically, this threshold cutoff is calculated where (TPR - FPR) is maximum.

Another metric associated with an ROC curve is AUC (Area under the Curve). AUC will range between 0 and 1. The higher the AUC the better the model.

AUC ~ 0 : Mis-classifying almost all the predicted values
AUC = 0.5 : As good as a random guess.
AUC ~ 1 : Model is perfectly classifying each input

Python Implementation: sklearn.metrics -> roc_curve, roc_auc_score

Search This Blog

Data Science

Receiver Operating Characteristic (ROC) Curve

Comments

Post a Comment

Popular posts from this blog

Precision Recall Curve

Principal Component Analysis (PCA)

Precision, Recall and Accuracy for Classification models