Principal Component Analysis (PCA)

The principal components of a collection of points in a real p-space are a sequence of p direction vectors, where the ith vector is the direction of a line that best fits the data while being orthogonal to the first i-1 vectors. Here, a best-fitting line is defined as one that minimizes the average squared distance from the points to the line. These directions constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Principal component analysis (PCA) is the process of computing the principal components and using them to perform a change of basis on the data, sometimes using only the first few principal components and ignoring the rest.

PCA is used in exploratory data analysis and for making predictive models. It is commonly used for dimensionality reduction by projecting each data point onto only the first few principal components to obtain lower-dimensional data while preserving as much of the data's variation as possible. The first principal component can equivalently be defined as a direction that maximizes the variance of the projected data. The ith principal component can be taken as a direction orthogonal to the first i-1 principal components that maximizes the variance of the projected data.

from Principal Component Analysis - Wikipedia

This topic includes the following resources and journeys:

5 items

Principal Component Analysis (PCA) 2 [Python]

Steve Brunton

7 min

Intermediate

Video

Application

This video describes how the singular value decomposition (SVD) can be used for principal component analysis (PCA) in Python (part 2).

Principal Component Analysis (PCA) 1 [Python]

Steve Brunton

7 min

Intermediate

Video

Application

This video describes how the singular value decomposition (SVD) can be used for principal component analysis (PCA) in Python (part 1).

Principal Component Analysis (PCA)

Steve Brunton

13 min

Beginner

Video

Theory

Principal component analysis (PCA) is a workhorse algorithm in statistics, where dominant correlation patterns are extracted from high-dimensional data.

Robust Principal Component Analysis (RPCA)

Steve Brunton

22 min

Intermediate

Video

Theory

Robust statistics is essential for handling data with corruption or missing entries. This robust variant of principal component analysis (PCA) is now a workhorse algorithm in several fields...