Peter Ralph
Advanced Biological Statistics
There are many dimension reduction methods, e.g.:
PCA uses the covariance matrix, which measures similarity.
t-SNE begins with the matrix of distances, measuring dissimilarity.
Are distances interpretable?
metric: In PCA, each axis is a fixed linear combination of variables. So, distances always mean the same thing no matter where you are on the plot.
non-metric: In t-SNE, distances within different clusters are not comparable.
From ordination.okstate.edu, about ordination in ecology:
Graphical results often lead to intuitive interpretations of species-environment relationships.
A single multivariate analysis saves time, in contrast to a separate univariate analysis for each species.
Ideally and typically, dimensions of this ‘low dimensional space’ will represent important and interpretable environmental gradients.
If statistical tests are desired, problems of multiple comparisons are diminished when species composition is studied in its entirety.
Statistical power is enhanced when species are considered in aggregate, because of redundancy.
By focusing on ‘important dimensions’, we avoid interpreting (and misinterpreting) noise.
Ordination methods are strongly influenced by sampling: ordination may ignore large-scale patterns in favor of describing variation within a highly oversampled area.
Ordination methods also describe patterns common to many variables: measuring the same thing many times may drive results.
Many methods are designed to find clusters, because our brain likes to categorize things. This doesn’t mean those clusters are well-separated in reality.
The goal is usually to produce a picture in which similar things are nearby each other, while also capturing global structure.