Precision Recall Curve
Precision Recall (PR) curve is used for the analysis of results from a classification model.
A
classification model outputs probabilities. A cutoff decides whether to
classify a probability as a positive or a negative output. By default,
this is assumed to be 0.5. So, a probability below 0.5 belongs to
negative class and greater equal-to 0.5 belongs to positive class. But
this is an assumption. A precision-recall (PR) curve helps us play with this assumption
to best suit the problem at hand i.e. whether we want to decrease or
increase the number of positives from the prediction.
This
cutoff is tuned between 0 and 1 to plot the two metrics i.e. Precision and Recall in a PR curve. Corresponding probability
cutoffs are present separately in a table.
Now coming to the concept of Precision and Recall -
- For all predicted positives, they can be either true (right) or false (wrong).
- For all predicted negatives, they can be either true (right) or false (wrong).
- Precision is a ratio defined as (True Positives)/(Predicted Positives). Ideally 1.
- Recall is a ratio defined as (True Positives)/(Actual Positives). It is also called Sensitivity or True Positive Rate (TPR). Ideally 1.
In a PR curve -
- when cutoff is 0, then all values are predicted positive, Precision is low and Recall = 1. The graph starts somewhere from the right line (1,).
- when cutoff is 1, then Precision = 1 and Recall = 0. The graph ends at top left corner (0,1).
The concept of threshold cutoff in a PR curve helps us choose a point on the curve wherein the rate of increase of precision is highest for the smallest decrease of Recall. Theoretically, this threshold cutoff is calculated where the harmonic mean of Precision and Recall is maximum.
Python Implementation: sklearn.metrics -> precision_recall_curve
Comments
Post a Comment