Precision Recall Curve

Precision Recall (PR) curve is used for the analysis of results from a classification model.

A classification model outputs probabilities. A cutoff decides whether to classify a probability as a positive or a negative output. By default, this is assumed to be 0.5. So, a probability below 0.5 belongs to negative class and greater equal-to 0.5 belongs to positive class. But this is an assumption. A precision-recall (PR) curve helps us play with this assumption to best suit the problem at hand i.e. whether we want to decrease or increase the number of positives from the prediction.

This cutoff is tuned between 0 and 1 to plot the two metrics i.e. Precision and Recall in a PR curve. Corresponding probability cutoffs are present separately in a table.

Now coming to the concept of Precision and Recall -

  • For all predicted positives, they can be either true (right) or false (wrong).
  • For all predicted negatives, they can be either true (right) or false (wrong). 
  • Precision is a ratio defined as (True Positives)/(Predicted Positives). Ideally 1.
  • Recall is a ratio defined as (True Positives)/(Actual Positives). It is also called Sensitivity or True Positive Rate (TPR). Ideally 1.

In a PR curve -

  • when cutoff is 0, then all values are predicted positive, Precision is low and Recall = 1. The graph starts somewhere from the right line (1,).
  • when cutoff is 1, then Precision = 1 and Recall = 0. The graph ends at top left corner (0,1).

The concept of threshold cutoff in a PR curve helps us choose a point on the curve wherein the rate of increase of precision is highest for the smallest decrease of Recall. Theoretically, this threshold cutoff is calculated where the harmonic mean of Precision and Recall is maximum.

Python Implementation: sklearn.metrics -> precision_recall_curve
 

Comments

Popular posts from this blog

Principal Component Analysis (PCA)

Transfer Learning

Receiver Operating Characteristic (ROC) Curve