Principal component analysis is a Dimensionality reduction technique under the assumption that the variables are correlated and that the relationships between them are linear.
The technique involves first building the covariance matrix of the observations, and then finding its eigenvalues and eigenvectors. The eigenvectors represent a new basis for the data, with the eigenvalues representing the variance along their corresponding eigenvectors.
Consequently, the largest eigenvalue(s) represent the direction(s) with maximal variance. That is, these principal components represent the axes that capture (or “explain”) the most variation in the data.
In areas as diverse as genomics and recommendation systems, just a few principal components are often sufficient to explain the vast majority of the variation in thousands of dimensions of data.
Principal component analysis can be viewed as a maximization of a quadratic form