Receiver Operating Characteristic (ROC) Curve
Receiver Operating Characteristic (ROC) curve is used for the analysis of results from a classification model.
A classification model outputs probabilities. A cutoff decides whether to classify a probability as a positive or a negative output. By default, this is assumed to be 0.5. So, a probability below 0.5 belongs to negative class and greater equal-to 0.5 belongs to positive class. But this is an assumption. An ROC curve helps us play with this assumption to best suit the problem at hand i.e. whether we want to decrease or increase the number of positives from the prediction.
This cutoff is tuned between 0 and 1 to plot two metrics in an ROC curve, namely - True Positive Rate (TPR) and False Positive Rate (FPR). Only TPR and FPR values are plotted on the graph. Corresponding probability cutoffs are present separately in a table.
Now coming to the concept of TPR and FPR -
- For all predicted positives, they can be either true (right) or false (wrong).
- For all predicted negatives, they can be either true (right) or false (wrong).
- In ROC curve, we are only concerned about predicted positives.
- TPR = (True Positives)/(Actual Positives). Ideally 1.
- FPR= (False Positives)/(Actual Negatives). Ideally 0.
In an ROC curve -
- when cutoff is 0, then TPR = 1 and FPR = 1. The graph starts from top right corner (1,1).
- when cutoff is 1, then TPR = 0 and FPR = 0. The graph ends at bottom left corner (0,0).
There is a concept of threshold cutoff in an ROC curve which helps us choose a point on the curve wherein the rate of increase of TPR is greatest compared to the rate of increase of FPR. This is an inflection point on the ROC curve. Theoretically, this threshold cutoff is calculated where (TPR - FPR) is maximum.
Another metric associated with an ROC curve is AUC (Area under the Curve). AUC will range between 0 and 1. The higher the AUC the better the model.
- AUC ~ 0 : Mis-classifying almost all the predicted values
- AUC = 0.5 : As good as a random guess.
- AUC ~ 1 : Model is perfectly classifying each input
Comments
Post a Comment