Posts

Factor Analysis

Factor analysis is a dimentionality reduction technique. It tries to find out the inherent latent factors behind the input features. It is based on the correlation matrix of input features. It requires a large sample size since correlation stabilizes only after large number of data points.  It is more exploratory in nature compared to the other dimentionality reduction technique of PCA. It is also different from PCA as PCA is based on explained variance concept of input features. Another difference is PCA only produces orthogonal components. Factor analysis has both variants - oblique (with correlation - direct oblimin, promax) and orthogonal (without correlation - varimax, equimax, quartimax). Factor analysis with oblique rotation is generally preferred as it can also produce orthogonal output if the input features are uncorrelated. Both FA and PCA use variance co-variance matrix for calculation but the diagonal values are different for both. PCA uses 1 in the diagonal values, hen...

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a dimentionality reduction technique. It is an unsupervised learning algorithm. Dimentionality refers to the number of input features or columns in a dataset. PCA reduces the number of input features in the model by grouping them together to create new features while preserving as much information as possible. The number of new features (components) of PCA is equal to the number of input features. The reasons for using this technique are - Provide smaller set of input features to the model, after having removed unwanted columns and columns having no effect on the output Group columns which are redundant, highly correlated and depict the same underlying concept. Having these extra columns leads to overfitting and unnecessary complexity. For model regularization. PCA does lead to some information loss while reducing the features but it can make the model simpler to understand and increases validation accuracy PCA uses covariance matrix. The first co...

Precision Recall Curve

Image
Precision Recall (PR) curve is used for the analysis of results from a classification model. A classification model outputs probabilities. A cutoff decides whether to classify a probability as a positive or a negative output. By default, this is assumed to be 0.5. So, a probability below 0.5 belongs to negative class and greater equal-to 0.5 belongs to positive class. But this is an assumption. A precision-recall (PR) curve helps us play with this assumption to best suit the problem at hand i.e. whether we want to decrease or increase the number of positives from the prediction. This cutoff is tuned between 0 and 1 to plot the two metrics i.e. Precision and Recall in a PR curve. Corresponding probability cutoffs are present separately in a table. Now coming to the concept of Precision and Recall - For all predicted positives, they can be either true (right) or false (wrong). For all predicted negatives, they can be either true (right) or false (wrong) .   Precision is a...

Transfer Learning

Transfer learning refers to using pre-trained deep learning models in similar datasets to which it was trained on. Pre-trained deep learning models have weights (or coefficients or parameters) determined beforehand by training and optimizing them on extremely huge datasets. These models have some of the highest test accuracy% for the specific dataset types they have been trained on and have been the de-facto standard, state-of-the-art (SOTA) models when they were introduced first. These models are built using a variation of either Convolutional neural nets (CNNs) or Recurrent neural nets (RNNs). CNNs and RNNs work extremely well, thus, mostly used on datasets that have correlated patterns amongst nearby input values e.g. images, languages, music, videos, time-series and text data. Few of the examples are - VGG16, ResNet50, InceptionV3 for images; BERT, GPT-2, XLNet, ELMO for NLP. Typically these CNNs and RNNs consist of numerous training layers, with each layer consisting of multiple n...

Precision, Recall and Accuracy for Classification models

Image
The accuracy of a classification model is judged by whether the predicted class is the same as the actual class. Let us assume the levels of a class as either positive or negative, then - For all predicted positives, they can be either true (right) or false (wrong). For all predicted negatives, they can be either true (right) or false (wrong). Precision is a ratio defined as (True Positives)/(Predicted Positives). Ideally 1. Recall is a ratio defined as (True Positives)/(Actual Positives). It is also called Sensitivity or True Positive Rate (TPR). Ideally 1. Specificity is a ratio defined as (True Negatives)/(Actual Negatives). It is also called Selectivity. Ideally 1. Accuracy is a ratio defined as (True Positives + True Negatives)/(Total Predictions). Ideally 1. Python Implementation: sklearn.metrics -> classification_report

Receiver Operating Characteristic (ROC) Curve

Image
Receiver Operating Characteristic (ROC) curve is used for the analysis of results from a classification model. A classification model outputs probabilities. A cutoff decides whether to classify a probability as a positive or a negative output. By default, this is assumed to be 0.5. So, a probability below 0.5 belongs to negative class and greater equal-to 0.5 belongs to positive class. But this is an assumption. An ROC curve helps us play with this assumption to best suit the problem at hand i.e. whether we want to decrease or increase the number of positives from the prediction. This cutoff is tuned between 0 and 1 to plot two metrics in an ROC curve, namely - True Positive Rate (TPR) and False Positive Rate (FPR). Only TPR and FPR values are plotted on the graph. Corresponding probability cutoffs are present separately in a table. Now coming to the concept of TPR and FPR - For all predicted positives, they can be either true (right) or false (wrong). For all predicted negatives, they...

Analysis of Variance (ANOVA)

ANOVA technique deals with ascertaining differences in groups within a population. The dependent variable is continuous and the independent variable is categorical. Assumptions - Data is normally distributed Population and groups variance is homogenenous and have similar variance Samples are random and independent It uses the F-value statistic. It is the ratio of 'variance among groups'/'variance within group'. Higher F-value leads to rejection of null hypothesis. The null hypothesis states that there is no difference between groups. The alternate hypothesis states that at least one group is different. One-way ANOVA: 1 continuous dependent; 1 categorical independent (> 2 levels) (* Special case - t-test: 1 continuous dependent; 1 categorical independent (2 levels)) Two-way ANOVA: 1 continuous dependent; 2 or more categorical independent Python Implementation: scipy.stats, statsmodels