Factor Analysis

April 12, 2021

Factor analysis is a dimentionality reduction technique. It tries to find out the inherent latent factors behind the input features. It is based on the correlation matrix of input features. It requires a large sample size since correlation stabilizes only after large number of data points.

It is more exploratory in nature compared to the other dimentionality reduction technique of PCA. It is also different from PCA as PCA is based on explained variance concept of input features. Another difference is PCA only produces orthogonal components. Factor analysis has both variants - oblique (with correlation - direct oblimin, promax) and orthogonal (without correlation - varimax, equimax, quartimax). Factor analysis with oblique rotation is generally preferred as it can also produce orthogonal output if the input features are uncorrelated.

Both FA and PCA use variance co-variance matrix for calculation but the diagonal values are different for both. PCA uses 1 in the diagonal values, hence the output includes variance explained by other features as well as variance unique to each feature (not explained by other features), both summing up to 1. On the other hand, FA uses total explained variance/communalities in the diagonal values, hence the output factors only account for the explained/common variance in the input features.

The output of factor analysis is a set of factors/eigenvalues along with their loading for each input feature. A threshold coefficient value can decide the combination of input features for each output factor. The final number of reduced factors is decided by filtering out factors which have low weight loadings/coefficients.

Python Implementation :

from factor_analyzer import FactorAnalyzer; 
object = FactorAnalyzer(n_factors=,rotation=""); Object.fit(); 
Object.get_eigenvalues(); Object.loadings_

Search This Blog

Data Science

Factor Analysis

Comments

Post a Comment

Popular posts from this blog

Precision Recall Curve

Principal Component Analysis (PCA)

Precision, Recall and Accuracy for Classification models