aka Principal Component Analysis (PCA)
Unsupervised Learning
-
Find the direction in which the data varies the most and use it as the base axis.
-
Find the direction in which the data varies the most with respect to that axis and use it as the second axis.
- Repeat this process for a total of {number of dimensions} times.
-
This will result in new axes arranged in order of variability.
-
By plotting the data in a 2D plot using the first and second axes of highest variability, the cancer example looks like this.
- This is based on the hypothesis that the most variable feature is the most meaningful feature.
- If the hypothesis is correct, the data will be well classified.
- This is based on the hypothesis that the most variable feature is the most meaningful feature.
-
Think of it as sorting the dimensions by importance.
-
Applications: feature extraction, visualizing data in a single graph.
- Feature extraction: Data can have a form that is more suitable for visualization than its original form.
-
Challenge: Humans cannot understand the meaning of the transformed axes.