PCA

Regression and classification are confirmatory •Answers to particular questions •Does wine-drinking influence heart disease? (regression) •How well can we separate between normal and abnormal ECG? (classification) •Supervised - solutions are governed by the outcome variable

PCA is exploratory •Explore examples of typical (common) observations based on your data set. •Unsupervised, no outcome variable - let data speak for itself. •Structure in data •Outlier detection •Dimensionality reduction (data compression)

PCA and the Curse of Dimensionality

The Volume Explosion: •In 1-D: 100 points occupy 75% of the space. •In 10-D: You need 10010 points to get the same coverage.

The Manifold Hypothesis: High-dimensional data (images, audio) is almost never truly p-dimensional. It lives on a low-dimensional manifold. Why can we still learn from it?

•Clustering: Data is naturally grouped. •Correlation: Features are redundant. •Structure: Physical constraints limit variability.

PCA recovers this hidden structure.

Intuition A high-res photo has 1,000,000 pixels (p). If we change pixels randomly, we get ’noise’. Real photos only occupy a tiny portion of that 1M-dimensional space.

image.png

image.png

image.png

image.png

image.png

image.png