Tuesday, April 7, 2009

principal components analysis (PCA)

PCA is a statistical tool that provides a means of identifying patterns in data by highlighting its similarities and differences. It can also be used for lossy compression by reducing the dimensionality of the data.

How is it performed:

1. Get the mean of the data in each dimension and subtract it from each data point. This will produce a data set with zero mean.
2. Calculate the covariance matrix
3. Calculate the eigenvalues and eigenvectors of the covariance matrix
4. Choose the number of eigenvalues/eigenvectors to consider (largest is the most significant component of the data, so larger values should be considered first)
5. Derive the new data set using the eigenvalue-eigenvector pair selected:
a. take the transpose of the unit eigenvector
b. multiply it on the left of the transpose of the original data set

Pre-requisites:
- computing the covariance
OK- calculating eigenvalues and eigenvectors
OK- computing the transpose of a matrix

next topic: Reinforcement Learning