capscale()
function in the package VEGAN
.cmdscale()
in the base installation, but you would need to produce a distance matrix from the original data.capscale()
function is designed for another purpose, so the syntax is a bit different than the other ordination methods, but it can be used to perform PCoA:vegdist()
vegdist()
function has more distances, including some more applicable to (paleo)ecological data:vegdist()
are: “manhattan”, “euclidean”, “canberra”, “bray”, “kulczynski”, “jaccard”, “gower”, “altGower”, “morisita”, “horn”, “mountford”, “raup” , “binomial” or “chao” and the default is bray or Bray-Curtis.vegdist()
to see how it affects your resultscmdscale
functions as part of the basic R installation you will need to have a data frame containing only numerical data (there can be row names).What if we have factor variables that we’d like to use in an analysis?
vegdist()
.decostand()
to “normalize,” which accounts for differing total read #s per sample.vegdist()
function has more distancesdecostand
function is a form of normalizationR^2
value tell you? Is the model accurately predicting the observed dissimilarity?Both focus on the analysis of a factor variable but with multiple continuous response variables
MANOVA - explicitly test the difference between levels of a factor(s) using the first discriminant function
DFA - defines the set of linear discriminant functions that most clearly differentiate the levels of a factor variable in multivariate space
MANOVA focuses on hypothesis testing and DFA is used more for classification
\[ z_{ik} = constant + c_1y_{i1} + c_2y_{i2} + c_3y_{i3} + ... + c_py_{ip}\]
clay_data <- read.table('Clays_RNAseq.tsv', header=T, sep=‘\t')
head(clay_data)
geneA <- clay_data$Gene110
geneB <- clay_data$Gene147
geneC <- clay_data$Gene292
install.packages('mvnormtest')
library(mvnormtest)
three_genes <- t(as.matrix(cbind(clay_data$Gene110,clay_data$Gene147,
clay_data$Gene292)))
mshapiro.test(three_genes)
microbiota <- clay_data$Microbiota
clay_manova <- manova(cbind(geneA, geneB, geneC) ~ microbiota)
summary(clay_manova, test = "Pillai")
summary(clay_manova, test = "Wilks")
summary(clay_manova, test = "Hotelling-Lawley")
summary(clay_manova, test = “Roy")
summary.aov(clay_manova)
summary.aov()
do, and what does this tell you about the data?geno_micro <- clay_data$Geno.Micro
clay_manova <- manova(cbind(geneA, geneB, geneC) ~ geno_micro)
summary(clay_manova, test = "Pillai")
summary(clay_manova, test = "Wilks")
summary(clay_manova, test = "Hotelling-Lawley")
summary(clay_manova, test = "Roy")
summary.aov(clay_manova)
microbiota <- clay_data$Microbiota
genotype <- clay_data$Genotype
clay_manova <- manova(cbind(geneA, geneB, geneC) ~ genotype*microbiota)
summary(clay_manova, test = "Pillai")
summary(clay_manova, test = "Wilks")
summary(clay_manova, test = "Hotelling-Lawley")
summary(clay_manova, test = "Roy")
summary.aov(clay_manova)
Fit the MANOVA model
clay_manova <- manova(cbind(clay_data$Gene5,clay_data$Gene6,
clay_data$Gene7,clay_data$Gene8,clay_data$Gene9,
clay_data$Gene10,clay_data$Gene11,clay_data$Gene12,
clay_data$Gene13,clay_data$Gene14)~genotype*microbiota)
summary(clay_manova, test = "Pillai")
summary(clay_manova, test = "Wilks")
summary(clay_manova, test = "Hotelling-Lawley")
summary(clay_manova, test = "Roy")
summary.aov(clay_manova)
Both focus on the analysis of a factor variable but with multiple continuous response variables
MANOVA - explicitly test the difference between levels of a factor(s) using the first discriminant function
DFA - defines the set of linear discriminant functions that most clearly differentiate the levels of a factor variable in multivariate space
MANOVA focuses on hypothesis testing and DFA is used more for classification
predict()
output contain?Here we have 200 patients and their prior histology-based diagnoses. There are three diagnoses (“benign,” “sarcoma 1,” and “sarcoma 2”). Also included are data from a panel of 6 qPCR-based biomarkers. This set of biomarkers is considered largely congruent with histology for cancer diagnosis, but more accurate. Specifically, the histology approach incorrectly leads to a sarcoma diagnosis (when the tumor is actually benign) about 2% of the time, whereas the biomarker approach has no such problem.
The goal here is to use DFA to separate sarcoma from benign diagnoses via the 6 biomarkers, but using the prior histology diagnosis as an informative grouping variable. This way we can evaluate whether any of the prior diagnoses may be erroneous.
lda()
function to find the discriminant functions.diag <- biomarkers$diagnosis
library(MASS)
diag_lda <- lda(diag ~ mark1 + mark2 + mark3 + mark4 + mark5 + mark6)
diag_lda
LD_scores <- as.data.frame(predict(diag_lda)$x)
ldahist(LD_scores$LD1, biomarkers$diagnosis)#For LD1
ldahist(LD_scores$LD2, biomarkers$diagnosis)#For LD2
LD_scores$diag <- diag
#Adds the original diagnosis variable to the LD data frame
plot(LD_scores$LD2~LD_scores$LD1,
main="Prior diagnosis groups in 6-D biomarker space”)
ifelse()
functional statements).