\[F_{ik} = c_{1k}y_{i1} + c_{2k}y_{i2} + c_{3k}y_{i3} + c_{4k}y_{i4} + ... + c_{pk}y_{ip}\] where
i
indicates the observationk
indicates the new factorc
indicates the coefficients\[F_{ik} = c_{1k}y_{i1} + c_{2k}y_{i2} + c_{3k}y_{i3} + c_{4k}y_{i4} + ... + c_{pk}y_{ip}\]
p
)p-by-p
matrixAnd, let’s print the biplot of the data.
Lastly, let’s look at the scores of the original objects based on our new variables.
If you have time -
Go back to your single-factor ANOVA examples, and run this type of analysis for the first 2 PCs. Because you have 3 factor levels, set up contrasts as you see fit.
OK, now practice by performing a PCA on one of the RNAseq.tsv files.
capscale()
function in the package VEGAN
.cmdscale()
in the base installation, but you will need to produce a distance matrix from the original data.capscale()
function is designed for another purpose, so the syntax is a bit different than the other ordination methods, but it can be used to perform PCoA:cmdscale
functions as part of the basic R installation you will need to have a data frame containing only numerical data (there can be row names).vegdist()
vegdist()
function has more distances, including some more applicable to (paleo)ecological data:vegdist()
are: “manhattan”, “euclidean”, “canberra”, “bray”, “kulczynski”, “jaccard”, “gower”, “altGower”, “morisita”, “horn”, “mountford”, “raup” , “binomial” or “chao” and the default is bray or Bray-Curtis.vegdist()
to see how it affects your resultsWhat if we have factor variables that we’d like to use in an analysis?
vegdist()
.decostand()
to “normalize,” which accounts for differing total read #s per sample.vegdist()
function has more distancesdecostand
function is a form of normalization.