Multiple testing

Peter Ralph

Advanced Biological Statistics

Multiple comparisons

A silly example

Suppose 100 people did 100 well-executed experiments to ask if snails move faster while listening to metal than to mozart.

How many would find a statistically significant difference at \(p < 0.05\)?

Would any find a large effect size?

A less silly example

Suppose someone conducts a well-controlled study that records coronavirus infection rates and the mean daily consumption of 100 different foods in a bunch of people.

How many of the foods would be statistically significantly associated with lower infection rates at \(p < 0.05\)?

Would any have a large effect size?

The problem

A \(p\)-value is

the probability of seeing something at least as extreme as what was seen in the data, if the null hypothesis were true.

So, if the null hypothesis is true, then by definition, \(p\)-values are uniformly distributed between 0 and 1,

and so \(\P\{ p < 0.05 \} = 0.05\).

The Bonferroni Correction

A cutoff of \(p < 0.05\) ensures you should not wrongly reject the null hypothesis more than 5% of the time.

But, if you do \(n\) different tests, all at once?

To keep the probability of not wrongly rejecting any of the \(n\) null hypotheses to 5%, take a cutoff of \(p < 0.05/n\).

Example: Bonferroni

tp <- replicate(1000, t.test(rnorm(20))$p.value)
layout(t(1:2))
hist(tp, breaks=40, xlab='p-value')
plot(sort(tp), xlim=c(1,100), ylim=c(0, 0.1), ylab='p-values, sorted')
abline(h=c(0.05, 0.05/length(tp)), col=1:2)
legend("topright", lty=1, col=1:2, legend=paste("p=", c(0.05, 0.05/length(tp))))

plot of chunk r null

False Discovery Rate

To tolerate some errors, control the false discovery rate.

Suppose you test 1,000 different songs to see if they make snails go faster, obtain a \(p\)-value for each song, set a threshold \(p_0\), and study further those with \(p < p_0\).

We expect* no false positives if \(p_0 = 0.05 / 1000 = 0.00005\) (Bonferroni). (\(*\)at the 5% level)

If \(p_0\) is set to have a 5% false discovery rate,
and 100 songs fall below the threshold,
then we expect about 5 of these to be false positives.

> help(p.adjust)

p.adjust                 package:stats                 R Documentation

Adjust P-values for Multiple Comparisons

Description:

     Given a set of p-values, returns p-values adjusted using one of
     several methods.

Usage:

     p.adjust(p, method = p.adjust.methods, n = length(p))
     
     p.adjust.methods
     # c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
     #   "fdr", "none")
     
Arguments:

       p: numeric vector of p-values (possibly with ‘NA’s).  Any other
          R object is coerced by ‘as.numeric’.

  method: correction method, a ‘character’ string.  Can be abbreviated.

       n: number of comparisons, must be at least ‘length(p)’; only set
          this (to non-default) when you know what you are doing!

Details:

     The adjustment methods include the Bonferroni correction
     (‘"bonferroni"’) in which the p-values are multiplied by the
     number of comparisons.  Less conservative corrections are also
     included by Holm (1979) (‘"holm"’), Hochberg (1988)
     (‘"hochberg"’), Hommel (1988) (‘"hommel"’), Benjamini & Hochberg
     (1995) (‘"BH"’ or its alias ‘"fdr"’), and Benjamini & Yekutieli
     (2001) (‘"BY"’), respectively.  A pass-through option (‘"none"’)
     is also included.  The set of methods are contained in the
     ‘p.adjust.methods’ vector for the benefit of methods that need to
     have the method as an option and pass it on to ‘p.adjust’.

     The first four methods are designed to give strong control of the
     family-wise error rate.  There seems no reason to use the
     unmodified Bonferroni correction because it is dominated by Holm's
     method, which is also valid under arbitrary assumptions.

     Hochberg's and Hommel's methods are valid when the hypothesis
     tests are independent or when they are non-negatively associated
     (Sarkar, 1998; Sarkar and Chang, 1997).  Hommel's method is more
     powerful than Hochberg's, but the difference is usually small and
     the Hochberg p-values are faster to compute.

     The ‘"BH"’ (aka ‘"fdr"’) and ‘"BY"’ methods of Benjamini,
     Hochberg, and Yekutieli control the false discovery rate, the
     expected proportion of false discoveries amongst the rejected
     hypotheses.  The false discovery rate is a less stringent
     condition than the family-wise error rate, so these methods are
     more powerful than the others.

     Note that you can set ‘n’ larger than ‘length(p)’ which means the
     unobserved p-values are assumed to be greater than all the
     observed p for ‘"bonferroni"’ and ‘"holm"’ methods and equal to 1
     for the other methods.

Value:

     A numeric vector of corrected p-values (of the same length as ‘p’,
     with names copied from ‘p’).

References:

     Benjamini, Y., and Hochberg, Y. (1995).  Controlling the false
     discovery rate: a practical and powerful approach to multiple
     testing.  _Journal of the Royal Statistical Society Series B_,
     *57*, 289-300.  <URL: http://www.jstor.org/stable/2346101>.

     Benjamini, Y., and Yekutieli, D. (2001).  The control of the false
     discovery rate in multiple testing under dependency.  _Annals of
     Statistics_, *29*, 1165-1188.  doi: 10.1214/aos/1013699998 (URL:
     https://doi.org/10.1214/aos/1013699998).

     Holm, S. (1979).  A simple sequentially rejective multiple test
     procedure.  _Scandinavian Journal of Statistics_, *6*, 65-70.
     <URL: http://www.jstor.org/stable/4615733>.

     Hommel, G. (1988).  A stagewise rejective multiple test procedure
     based on a modified Bonferroni test.  _Biometrika_, *75*, 383-386.
     doi: 10.2307/2336190 (URL: https://doi.org/10.2307/2336190).

     Hochberg, Y. (1988).  A sharper Bonferroni procedure for multiple
     tests of significance.  _Biometrika_, *75*, 800-803.  doi:
     10.2307/2336325 (URL: https://doi.org/10.2307/2336325).

     Shaffer, J. P. (1995).  Multiple hypothesis testing.  _Annual
     Review of Psychology_, *46*, 561-584.  doi:
     10.1146/annurev.ps.46.020195.003021 (URL:
     https://doi.org/10.1146/annurev.ps.46.020195.003021).  (An
     excellent review of the area.)

     Sarkar, S. (1998).  Some probability inequalities for ordered MTP2
     random variables: a proof of Simes conjecture.  _Annals of
     Statistics_, *26*, 494-504.  doi: 10.1214/aos/1028144846 (URL:
     https://doi.org/10.1214/aos/1028144846).

     Sarkar, S., and Chang, C. K. (1997).  The Simes method for
     multiple hypothesis testing with positively dependent test
     statistics.  _Journal of the American Statistical Association_,
     *92*, 1601-1608.  doi: 10.2307/2965431 (URL:
     https://doi.org/10.2307/2965431).

     Wright, S. P. (1992).  Adjusted P-values for simultaneous
     inference.  _Biometrics_, *48*, 1005-1013.  doi: 10.2307/2532694
     (URL: https://doi.org/10.2307/2532694).  (Explains the adjusted
     P-value approach.)

See Also:

     ‘pairwise.*’ functions such as ‘pairwise.t.test’.

Examples:

     require(graphics)
     
     set.seed(123)
     x <- rnorm(50, mean = c(rep(0, 25), rep(3, 25)))
     p <- 2*pnorm(sort(-abs(x)))
     
     round(p, 3)
     round(p.adjust(p), 3)
     round(p.adjust(p, "BH"), 3)
     
     ## or all of them at once (dropping the "fdr" alias):
     p.adjust.M <- p.adjust.methods[p.adjust.methods != "fdr"]
     p.adj    <- sapply(p.adjust.M, function(meth) p.adjust(p, meth))
     p.adj.60 <- sapply(p.adjust.M, function(meth) p.adjust(p, meth, n = 60))
     stopifnot(identical(p.adj[,"none"], p), p.adj <= p.adj.60)
     round(p.adj, 3)
     ## or a bit nicer:
     noquote(apply(p.adj, 2, format.pval, digits = 3))
     
     
     ## and a graphic:
     matplot(p, p.adj, ylab="p.adjust(p, meth)", type = "l", asp = 1, lty = 1:6,
             main = "P-value adjustments")
     legend(0.7, 0.6, p.adjust.M, col = 1:6, lty = 1:6)
     
     ## Can work with NA's:
     pN <- p; iN <- c(46, 47); pN[iN] <- NA
     pN.a <- sapply(p.adjust.M, function(meth) p.adjust(pN, meth))
     ## The smallest 20 P-values all affected by the NA's :
     round((pN.a / p.adj)[1:20, ] , 4)

Example: False Discovery Rate

layout(t(1:2))
plot(sort(tp), xlim=c(1,100), ylim=c(0, 1.0), ylab='p-values, sorted', main='p')
abline(h=0.05, col=1:2)
plot(sort(p.adjust(tp, method='fdr')), xlim=c(1,100), ylim=c(0, 1.0), ylab='FDR-adjusted p-values, sorted', main='5% FDR')
abline(h=0.05, col=1:2)

plot of chunk r null2

Exercise:

Modify the code so that 20 of the datasets have a mean of \(\mu=1\) (not zero, like below). See how many of the \(p\)-values are below 0.05

with no correction
with Bonerroni correction
with an FDR correction

tp <- replicate(1000, t.test(rnorm(20))$p.value)
sprintf("%d of the %d p-values are below 0.05.", sum(tp < 0.05), length(tp))
layout(t(1:2))
hist(tp, breaks=40, xlab='p-value')
plot(sort(tp), xlim=c(1,100), ylim=c(0, 0.1), ylab='p-values, sorted')
abline(h=c(0.05, 0.05/length(tp)), col=1:2)
legend("topright", lty=1, col=1:2, legend=paste("p=", c(0.05, 0.05/length(tp))))

Many linear models

Gene expression levels

From Host Genotype and Microbiota Contribute Asymmetrically to Transcriptional Variation in the Threespine Stickleback Gut Clayton M. Small, Kathryn Milligan-Myhre, Susan Bassham, Karen Guillemin, William A. Cresko. Genome Biology and Evolution, March 2017.

Metadata: CVvsGF_RNAseq_Metadata.tsv (direct link)
RNA-Seq: CVvsGF_RNAseq_CPM.tsv (direct link)

Study design:

The data

fish <- read.table("../Datasets/stickleback_GFvsCV_RNAseq/CVvsGF_RNAseq_Metadata.tsv",
                   header=TRUE, sep='\t')
tmp <- read.table("../Datasets/stickleback_GFvsCV_RNAseq/CVvsGF_RNAseq_CPM.tsv",
                  header=TRUE, sep='\t', stringsAsFactors=FALSE, check.names=FALSE)
genes <- tmp[,1:5]
expression <- as.matrix(tmp[,6:ncol(tmp)])
# consistency check
stopifnot(all(match(colnames(expression), fish$Individual) == 1:nrow(fish)))

There are 8506 genes whose expression is measured in 84 fish.

The fish

fish

##    Individual Population Family Treatment Sex Flask
## 1       1A_02         FW    FW1        CV   M    1A
## 2       1A_03         FW    FW1        CV   F    1A
## 3       1A_04         FW    FW1        CV   F    1A
## 4       1A_05         FW    FW1        CV   M    1A
## 5       1A_08         FW    FW1        CV   F    1A
## 6       1A_10         FW    FW1        CV   M    1A
## 7       1D_01         FW    FW1        GF   F    1D
## 8       1D_02         FW    FW1        GF   M    1D
## 9       1D_04         FW    FW1        GF   M    1D
## 10      1D_06         FW    FW1        GF   F    1D
## 11      1D_07         FW    FW1        GF   M    1D
## 12      1D_08         FW    FW1        GF   F    1D
## 13      3A_01         FW    FW3        CV   F    3A
## 14      3A_02         FW    FW3        CV   M    3A
## 15      3A_04         FW    FW3        CV   M    3A
## 16      3A_06         FW    FW3        CV   F    3A
## 17      3A_08         FW    FW3        CV   M    3A
## 18      3A_10         FW    FW3        CV   F    3A
## 19      3C_01         FW    FW3        GF   M    3C
## 20      3C_03         FW    FW3        GF   F    3C
## 21      3C_04         FW    FW3        GF   M    3C
## 22      3C_08         FW    FW3        GF   M    3C
## 23      3C_09         FW    FW3        GF   F    3C
## 24      3C_10         FW    FW3        GF   F    3C
## 25      4B_04         FW    FW4        CV   F    4B
## 26      4B_06         FW    FW4        CV   M    4B
## 27      4B_07         FW    FW4        CV   F    4B
## 28      4B_08         FW    FW4        CV   F    4B
## 29      4B_09         FW    FW4        CV   M    4B
## 30      4B_10         FW    FW4        CV   M    4B
## 31      4C_03         FW    FW4        CV   M    4C
## 32      4C_04         FW    FW4        CV   M    4C
## 33      4C_05         FW    FW4        CV   F    4C
## 34      4C_07         FW    FW4        CV   F    4C
## 35      4C_08         FW    FW4        CV   M    4C
## 36      4C_14         FW    FW4        CV   F    4C
## 37      4D_01         FW    FW4        GF   F    4D
## 38      4D_05         FW    FW4        GF   M    4D
## 39      4D_07         FW    FW4        GF   F    4D
## 40      4D_08         FW    FW4        GF   M    4D
## 41      4D_09         FW    FW4        GF   F    4D
## 42      4D_13         FW    FW4        GF   M    4D
## 43      4F_01         FW    FW4        GF   F    4F
## 44      4F_02         FW    FW4        GF   F    4F
## 45      4F_03         FW    FW4        GF   M    4F
## 46      4F_04         FW    FW4        GF   M    4F
## 47      4F_06         FW    FW4        GF   M    4F
## 48      4F_08         FW    FW4        GF   F    4F
## 49      5A_01         OC    OC5        CV   M    5A
## 50      5A_02         OC    OC5        CV   F    5A
## 51      5A_03         OC    OC5        CV   M    5A
## 52      5A_04         OC    OC5        CV   M    5A
## 53      5A_05         OC    OC5        CV   F    5A
## 54      5A_08         OC    OC5        CV   M    5A
## 55      5B_01         OC    OC5        CV   F    5B
## 56      5B_02         OC    OC5        CV   F    5B
## 57      5B_04         OC    OC5        CV   M    5B
## 58      5B_06         OC    OC5        CV   M    5B
## 59      5B_08         OC    OC5        CV   M    5B
## 60      5B_10         OC    OC5        CV   F    5B
## 61      5C_01         OC    OC5        GF   M    5C
## 62      5C_02         OC    OC5        GF   F    5C
## 63      5C_03         OC    OC5        GF   M    5C
## 64      5C_05         OC    OC5        GF   F    5C
## 65      5C_07         OC    OC5        GF   M    5C
## 66      5C_09         OC    OC5        GF   M    5C
## 67      5D_01         OC    OC5        GF   F    5D
## 68      5D_04         OC    OC5        GF   F    5D
## 69      5D_05         OC    OC5        GF   M    5D
## 70      5D_07         OC    OC5        GF   F    5D
## 71      5D_09         OC    OC5        GF   M    5D
## 72      5D_11         OC    OC5        GF   M    5D
## 73      6A_01         OC    OC6        CV   M    6A
## 74      6A_02         OC    OC6        CV   F    6A
## 75      6A_04         OC    OC6        CV   F    6A
## 76      6A_05         OC    OC6        CV   M    6A
## 77      6A_07         OC    OC6        CV   F    6A
## 78      6A_10         OC    OC6        CV   M    6A
## 79      6C_01         OC    OC6        GF   M    6C
## 80      6C_03         OC    OC6        GF   M    6C
## 81      6C_04         OC    OC6        GF   F    6C
## 82      6C_06         OC    OC6        GF   M    6C
## 83      6C_09         OC    OC6        GF   F    6C
## 84      6C_14         OC    OC6        GF   F    6C

Expression levels

plot of chunk r plot_data

Normalize

To put coefficients on the same scale:

expr <- sweep(expression, 1, rowMeans(expression), "/")

plot of chunk r plotdata2

Fit lots of models:

pop_lms <- apply(expr, 1, function (x) (lm(x ~ Population, data=fish)))
all_lms <- apply(expr, 1, function (x) (lm(x ~ Population + Treatment + Sex, data=fish)))
anovas <- mapply(anova, pop_lms, all_lms, SIMPLIFY=FALSE)

and extract coefficients, \(p\)-values

pop_coefs <- sapply(pop_lms, coef)
all_coefs <- sapply(all_lms, coef)
pvals <- sapply(lapply(anovas, "[[", "Pr(>F)"), "[", 2)

The \(p\)-values

… for an ANOVA comparing

    gene expression ~ Population
    gene expression ~ Population + Treatment + Sex

hist(pvals, breaks=500, xlab='p-values')

plot of chunk r show_pvals

1158 \(p\)-values are less than 0.05.

But, out of 8506, we’d expect about 425 to be less than 0.05.

Let’s look at the three coefficients for all the models:

    x ~ Population + Treatment + Sex

Coefficents, with \(p < 0.05\) in red: plot of chunk r signif

Coefficents, with \(p < 0.05/n\) in red (Bonferroni!): plot of chunk r signif2

The Bonferroni Bunch

subset(genes, pvals < 0.05/nrow(genes))

##                 Gene_ID   Genome_Loc Gene_Start_bp                                                                                                                       Gene_Description
## 379  ENSGACG00000001729 scaffold_882          1249                                                                   *interleukin-8 [Lateolabrax japonicus];ABI48894 [Source:TopBlasxHit]
## 594  ENSGACG00000002048     groupXXI       2362051                                                       RNA-binding region (RNP1, RRM) containing 3 [Source:ZFIN;Acc:ZDB-GENE-060312-35]
## 833  ENSGACG00000002397       groupV        842100                                                                         stearoyl-CoA desaturase b [Source:ZFIN;Acc:ZDB-GENE-050522-12]
## 985  ENSGACG00000003874     groupXIX       5074262                                                                                    parvin, beta [Source:ZFIN;Acc:ZDB-GENE-030131-4411]
## 1005 ENSGACG00000003899     groupXIX       5087963                                                                                     parvin, gamma [Source:ZFIN;Acc:ZDB-GENE-070410-67]
## 1014 ENSGACG00000003911     groupXIX       5234537                                                                                         plexin b2b [Source:ZFIN;Acc:ZDB-GENE-080902-1]
## 1028 ENSGACG00000003928     groupXIX       5268421                                                        tubulin, gamma complex associated protein 6 [Source:HGNC Symbol;Acc:HGNC:18127]
## 1042 ENSGACG00000003947     groupXIX       5289965            adaptor protein, phosphotyrosine interaction, PH domain and leucine zipper containing 2 [Source:ZFIN;Acc:ZDB-GENE-081016-2]
## 1102 ENSGACG00000004020     groupXIX       5466205                                                                       transmembrane protein 117 [Source:ZFIN;Acc:ZDB-GENE-040426-2809]
## 1109 ENSGACG00000004028     groupXIX       5497683                                                                   twinfilin actin-binding protein 1 [Source:HGNC Symbol;Acc:HGNC:9620]
## 1128 ENSGACG00000004050     groupXIX       5505366                                                       interleukin-1 receptor-associated kinase 4 [Source:ZFIN;Acc:ZDB-GENE-040426-738]
## 1138 ENSGACG00000004063     groupXIX       5536933                                         ADAM metallopeptidase with thrombospondin type 1 motif, 20 [Source:HGNC Symbol;Acc:HGNC:17178]
## 1155 ENSGACG00000004088     groupXIX       5663914                                                                    prickle homolog 1a (Drosophila) [Source:ZFIN;Acc:ZDB-GENE-030724-5]
## 1161 ENSGACG00000004098     groupXIX       5670651                                                                                       periphilin 1 [Source:HGNC Symbol;Acc:HGNC:19369]
## 1162 ENSGACG00000004099     groupXIX       5675242                                                                                                                                       
## 1165 ENSGACG00000004103     groupXIX       5687641                                                                          YY1 associated factor 2 [Source:ZFIN;Acc:ZDB-GENE-041210-115]
## 1172 ENSGACG00000004112     groupXIX       5691367                                                                  glucoside xylosyltransferase 1b [Source:ZFIN;Acc:ZDB-GENE-041210-116]
## 1185 ENSGACG00000004129     groupXIX       5819632                                                                                         contactin 1 [Source:HGNC Symbol;Acc:HGNC:2171]
## 1193 ENSGACG00000004140     groupXIX       5932949                                                     patatin-like phospholipase domain containing 8 [Source:HGNC Symbol;Acc:HGNC:28900]
## 1195 ENSGACG00000004142     groupXIX       5949691                                                     DnaJ (Hsp40) homolog, subfamily B, member 9a [Source:ZFIN;Acc:ZDB-GENE-050626-115]
## 1197 ENSGACG00000004145     groupXIX       5951392                                                                                  dynamin 1-like [Source:ZFIN;Acc:ZDB-GENE-040426-1556]
## 1254 ENSGACG00000004208     groupXIX       5968125                                                                                     caldesmon 1b [Source:ZFIN;Acc:ZDB-GENE-090313-229]
## 1261 ENSGACG00000004215     groupXIX       5984689                                                                   2,3-bisphosphoglycerate mutase [Source:ZFIN;Acc:ZDB-GENE-040718-375]
## 1304 ENSGACG00000004276     groupXIX       6011659                                                       CCR4-NOT transcription complex, subunit 4a [Source:ZFIN;Acc:ZDB-GENE-090313-262]
## 1338 ENSGACG00000004329     groupXIX       6085324                                                                         lactate dehydrogenase Bb [Source:ZFIN;Acc:ZDB-GENE-040718-176]
## 1341 ENSGACG00000004333     groupXIX       6090639                                                                              golgi transport 1Ba [Source:ZFIN;Acc:ZDB-GENE-041210-157]
## 1355 ENSGACG00000004350     groupXIX       6093858                                                             solute carrier family 35, member B4 [Source:ZFIN;Acc:ZDB-GENE-030131-2457]
## 1359 ENSGACG00000004357     groupXIX       6098400                                        coiled-coil-helix-coiled-coil-helix domain containing 3b [Source:ZFIN;Acc:ZDB-GENE-030131-4476]
## 1381 ENSGACG00000004380     groupXIX       6111536                                                                neuroepithelial cell transforming 1 [Source:HGNC Symbol;Acc:HGNC:14592]
## 1384 ENSGACG00000004384     groupXIX       6122297                                                       ankyrin repeat and SOCS box containing 13b [Source:ZFIN;Acc:ZDB-GENE-091118-116]
## 1426 ENSGACG00000004439     groupXIX       6183509                                            dehydrogenase E1 and transketolase domain containing 1 [Source:ZFIN;Acc:ZDB-GENE-041212-44]
## 1463 ENSGACG00000004485     groupXIX       6201210                                                 calcium/calmodulin-dependent protein kinase 1Db [Source:ZFIN;Acc:ZDB-GENE-070112-1872]
## 1473 ENSGACG00000004498     groupXIX       6213441                                                                      cyclin-dependent kinase 16 [Source:ZFIN;Acc:ZDB-GENE-030131-2939]
## 1490 ENSGACG00000004521     groupXIX       6234660                                                                                                                                       
## 1518 ENSGACG00000004555     groupXIX       6253382                                                                                           netrin 4 [Source:ZFIN;Acc:ZDB-GENE-050310-1]
## 1534 ENSGACG00000004578     groupXIX       6269682                                                                  ubiquitin specific peptidase 3 [Source:ZFIN;Acc:ZDB-GENE-030131-5142]
## 1559 ENSGACG00000004613     groupXIX       6294749                                                             mannosidase, alpha, class 2C, member 1 [Source:ZFIN;Acc:ZDB-GENE-101103-4]
## 1609 ENSGACG00000004670     groupXIX       6301852                                                           nei endonuclease VIII-like 1 (E. coli) [Source:ZFIN;Acc:ZDB-GENE-040426-994]
## 1612 ENSGACG00000004675     groupXIX       6304507                                                                         COMM domain containing 4 [Source:ZFIN;Acc:ZDB-GENE-060929-600]
## 1616 ENSGACG00000004680     groupXIX       6310208                                                                                   semaphorin 7A [Source:ZFIN;Acc:ZDB-GENE-030131-3633]
## 1627 ENSGACG00000004691     groupXIX       6319533                                       immunoglobulin superfamily containing leucine-rich repeat 2 [Source:ZFIN;Acc:ZDB-GENE-050320-95]
## 1630 ENSGACG00000004695     groupXIX       6323877                                                                    stimulated by retinoic acid 6 [Source:ZFIN;Acc:ZDB-GENE-060616-252]
## 1651 ENSGACG00000004724     groupXIX       6381558                                                                          stomatin (EPB72)-like 1 [Source:ZFIN;Acc:ZDB-GENE-070209-241]
## 1667 ENSGACG00000004744     groupXIX       6392993                                                             hexosaminidase A (alpha polypeptide) [Source:ZFIN;Acc:ZDB-GENE-050417-283]
## 1671 ENSGACG00000004748       groupI        474993                                                       eukaryotic translation initiation factor 2A [Source:ZFIN;Acc:ZDB-GENE-050626-52]
## 1710 ENSGACG00000004795     groupXIX       6430393                                                       phosphopantothenoylcysteine decarboxylase [Source:ZFIN;Acc:ZDB-GENE-040426-1749]
## 1720 ENSGACG00000004806     groupXIX       6434587                                                                 3-hydroxyacyl-CoA dehydratase 3 [Source:ZFIN;Acc:ZDB-GENE-040426-1200]
## 1739 ENSGACG00000004827     groupXIX       6437512                                                     von Willebrand factor A domain containing 9 [Source:ZFIN;Acc:ZDB-GENE-030131-5804]
## 1772 ENSGACG00000004867     groupXIX       6453754                                                                   DENN/MADD domain containing 4A [Source:ZFIN;Acc:ZDB-GENE-060503-285]
## 1798 ENSGACG00000004903     groupXIX       6544984                                                       mitogen-activated protein kinase kinase 1 [Source:ZFIN;Acc:ZDB-GENE-040426-2759]
## 1939 ENSGACG00000005078     groupXIX       6569082                                               small nuclear RNA activating complex, polypeptide 5 [Source:ZFIN;Acc:ZDB-GENE-041111-50]
## 1946 ENSGACG00000005085     groupXIX       6609904                                                                            SMAD family member 6b [Source:ZFIN;Acc:ZDB-GENE-050419-198]
## 1951 ENSGACG00000005092     groupXIX       6644849                                                                              SMAD family member 3b [Source:ZFIN;Acc:ZDB-GENE-030128-4]
## 1975 ENSGACG00000005119     groupXIX       6652728                                                         alpha- and gamma-adaptin binding protein [Source:ZFIN;Acc:ZDB-GENE-040718-120]
## 2009 ENSGACG00000005167     groupXIX       6676365                                                                                       zgc:162898 [Source:ZFIN;Acc:ZDB-GENE-070410-107]
## 2015 ENSGACG00000005176     groupXIX       6715666                                                          protein inhibitor of activated STAT, 1b [Source:ZFIN;Acc:ZDB-GENE-050419-202]
## 2020 ENSGACG00000005181     groupXIX       6726200                                                                        mortality factor 4 like 1 [Source:ZFIN;Acc:ZDB-GENE-040718-348]
## 2072 ENSGACG00000005252     groupXIX       6844584                                                            damage-specific DNA binding protein 2 [Source:ZFIN;Acc:ZDB-GENE-050419-169]
## 2075 ENSGACG00000005255     groupXIX       6848563                                                   kelch repeat and BTB (POZ) domain containing 4 [Source:ZFIN;Acc:ZDB-GENE-040426-937]
## 2101 ENSGACG00000005293     groupXIX       6862001                                                              immunoglobulin mu binding protein 2 [Source:ZFIN;Acc:ZDB-GENE-050419-258]
## 2127 ENSGACG00000005331     groupXIX       6868449                                                                   chitinase domain containing 1 [Source:ZFIN;Acc:ZDB-GENE-030131-9169]
## 2145 ENSGACG00000005355     groupXIX       6872104                                                           Parkinson disease 7 domain containing 1 [Source:ZFIN;Acc:ZDB-GENE-051030-96]
## 2155 ENSGACG00000005365     groupXIX       6873765                                                                                   CD151 molecule [Source:ZFIN;Acc:ZDB-GENE-041010-137]
## 2180 ENSGACG00000005397     groupXIX       6878614                                                                                si:ch211-247i17.1 [Source:ZFIN;Acc:ZDB-GENE-131121-275]
## 2182 ENSGACG00000005399     groupXIX       6880605                                                   calcium release activated channel regulator 2B [Source:ZFIN;Acc:ZDB-GENE-061215-136]
## 2187 ENSGACG00000005406     groupXIX       6886885                                                                          transmembrane protein 138 [Source:ZFIN;Acc:ZDB-GENE-120912-1]
## 2194 ENSGACG00000005414     groupXIX       6889077                                                                       transmembrane protein 258 [Source:ZFIN;Acc:ZDB-GENE-040426-1739]
## 2195 ENSGACG00000005416     groupXIX       6895042                                                                          myelin regulatory factor [Source:ZFIN;Acc:ZDB-GENE-080204-57]
## 2209 ENSGACG00000005436     groupXIX       6921327                                                                                si:ch1073-89b12.1 [Source:ZFIN;Acc:ZDB-GENE-131121-340]
## 2233 ENSGACG00000005468     groupXIX       6952818                                                                                 synaptotagmin VIIa [Source:ZFIN;Acc:ZDB-GENE-090601-5]
## 2243 ENSGACG00000005483     groupXIX       7010663                                                                                                                                       
## 2248 ENSGACG00000005489     groupXIX       7013605                                                                                   si:dkey-201c1.2 [Source:ZFIN;Acc:ZDB-GENE-110408-61]
## 2263 ENSGACG00000005509     groupXIX       7017971                                                                                                                                       
## 2268 ENSGACG00000005514     groupXIX       7023483                                                               p53-induced death domain protein 1 [Source:ZFIN;Acc:ZDB-GENE-081104-353]
## 2288 ENSGACG00000005541     groupXIX       7040298                                                                                   zmp:0000001167 [Source:ZFIN;Acc:ZDB-GENE-140106-127]
## 2307 ENSGACG00000005561     groupXIX       7055587                                                              ATH1, acid trehalase-like 1 (yeast) [Source:ZFIN;Acc:ZDB-GENE-061103-319]
## 2345 ENSGACG00000005613     groupXIX       7078941                                                          RAB3A interacting protein (rabin3)-like 1 [Source:ZFIN;Acc:ZDB-GENE-110921-5]
## 2361 ENSGACG00000005632     groupXIX       7091066                                                                       Hermansky-Pudlak syndrome 5 [Source:ZFIN;Acc:ZDB-GENE-070410-80]
## 2377 ENSGACG00000005655       groupX       8128412                                                          small nuclear ribonucleoprotein 40 (U5) [Source:ZFIN;Acc:ZDB-GENE-040426-978]
## 2381 ENSGACG00000005659     groupXIX       7099791                                                  general transcription factor IIH, polypeptide 1 [Source:ZFIN;Acc:ZDB-GENE-040912-164]
## 2445 ENSGACG00000005940     groupXIX       7354155                                                                        fin bud initiation factor a [Source:ZFIN;Acc:ZDB-GENE-111031-2]
## 2473 ENSGACG00000005974     groupXIX       7374591                                    caseinolytic mitochondrial matrix peptidase chaperone subunit b [Source:ZFIN;Acc:ZDB-GENE-130404-1]
## 2481 ENSGACG00000005988     groupXIX       7395960                                               adaptor-related protein complex 4, epsilon 1 subunit [Source:ZFIN;Acc:ZDB-GENE-061221-3]
## 2487 ENSGACG00000005996     groupXIX       7404582                                          guanine nucleotide binding protein (G protein), beta 5a [Source:ZFIN;Acc:ZDB-GENE-070112-342]
## 2490 ENSGACG00000006001     groupXIX       7409406                                                                                        myosin VC [Source:ZFIN;Acc:ZDB-GENE-131127-196]
## 2508 ENSGACG00000006025     groupXIX       7420099                                                                                        myosin VAb [Source:ZFIN;Acc:ZDB-GENE-050411-72]
## 2533 ENSGACG00000006058     groupXIX       7436490                                                               ribosomal L24 domain containing 1 [Source:ZFIN;Acc:ZDB-GENE-040426-1925]
## 2564 ENSGACG00000006101     groupXIX       7477087                                                                            transcription factor 12 [Source:HGNC Symbol;Acc:HGNC:11623]
## 2571 ENSGACG00000006110     groupXIX       7488251                                                                                    cingulin-like 1 [Source:HGNC Symbol;Acc:HGNC:25931]
## 2588 ENSGACG00000006135     groupXIX       7504978                                                                   ADAM metallopeptidase domain 10b [Source:ZFIN;Acc:ZDB-GENE-071115-1]
## 2594 ENSGACG00000006141     groupXIX       7515235                                                              FANCD2/FANCI-associated nuclease 1 [Source:ZFIN;Acc:ZDB-GENE-030131-6225]
## 2613 ENSGACG00000006260     groupXIX       7626845                                                           sulfide quinone reductase-like (yeast) [Source:ZFIN;Acc:ZDB-GENE-050417-436]
## 2628 ENSGACG00000006281     groupXIX       7634611       CTD (carboxy-terminal domain, RNA polymerase II, polypeptide A) small phosphatase like 2b [Source:ZFIN;Acc:ZDB-GENE-030131-1809]
## 2656 ENSGACG00000006315     groupXIX       7644684                                                  poly (ADP-ribose) polymerase family, member 16 [Source:ZFIN;Acc:ZDB-GENE-040426-2289]
## 2678 ENSGACG00000006340     groupXIX       7654842                                                                                                                                       
## 2687 ENSGACG00000006351     groupXIX       7669052                                              UDP glucuronosyltransferase 5 family, polypeptide A1 [Source:ZFIN;Acc:ZDB-GENE-051120-60]
## 2813 ENSGACG00000006516     groupXIX       7716217                                                                 cytochrome c oxidase subunit Vaa [Source:ZFIN;Acc:ZDB-GENE-050522-133]
## 2815 ENSGACG00000006521     groupXIX       7718951                                                             A kinase (PRKA) interacting protein 1 [Source:ZFIN;Acc:ZDB-GENE-030829-24]
## 2857 ENSGACG00000006582     groupXIX       7755202                                            proteasome (prosome, macropain) subunit, alpha type, 1 [Source:ZFIN;Acc:ZDB-GENE-040801-15]
## 2915 ENSGACG00000006659     groupXIX       7928238                                                                                        zgc:56106 [Source:ZFIN;Acc:ZDB-GENE-040426-904]
## 2934 ENSGACG00000006687     groupXIX       7934480                                         pleckstrin homology domain containing, family A member 7a [Source:ZFIN;Acc:ZDB-GENE-050419-75]
## 2954 ENSGACG00000008302     groupXIX       9271720                                                                                 synaptotagmin VIII [Source:ZFIN;Acc:ZDB-GENE-060303-4]
## 2962 ENSGACG00000008315     groupXIX       9287888                                           troponin I type 2a (skeletal, fast), tandem duplicate 1 [Source:ZFIN;Acc:ZDB-GENE-041114-60]
## 2981 ENSGACG00000008376     groupXIX       9337694                                                                    lymphocyte-specific protein 1 [Source:ZFIN;Acc:ZDB-GENE-131127-171]
## 3080 ENSGACG00000008494     groupXIX       9448469                                                                                si:ch1073-174d20.2 [Source:ZFIN;Acc:ZDB-GENE-121214-51]
## 3107 ENSGACG00000008531     groupXIX       9501117                                                                 ring finger and WD repeat domain 3 [Source:ZFIN;Acc:ZDB-GENE-120529-1]
## 3113 ENSGACG00000008543     groupXIX       9517063                                        transmembrane emp24 protein transport domain containing 6 [Source:ZFIN;Acc:ZDB-GENE-131121-182]
## 3117 ENSGACG00000008550     groupXIX       9523896                                                                zinc finger, DHHC-type containing 7 [Source:HGNC Symbol;Acc:HGNC:18459]
## 3132 ENSGACG00000008572     groupXIX       9532775                                                             ankyrin repeat domain 27 (VPS9 domain) [Source:ZFIN;Acc:ZDB-GENE-121105-1]
## 3136 ENSGACG00000008577     groupXIX       9535480                                                             ankyrin repeat domain 27 (VPS9 domain) [Source:ZFIN;Acc:ZDB-GENE-121105-1]
## 3152 ENSGACG00000008599     groupXIX       9564658                                                                                        zgc:162267 [Source:ZFIN;Acc:ZDB-GENE-070410-53]
## 3158 ENSGACG00000008607      groupXI       5736635                  signal transducer and activator of transcription 3 (acute-phase response factor) [Source:ZFIN;Acc:ZDB-GENE-980526-68]
## 3163 ENSGACG00000008612     groupXIX       9579090                                                                                                                                       
## 3168 ENSGACG00000008617     groupXIX       9607677                                      protein tyrosine phosphatase, receptor-type, Z polypeptide 1a [Source:ZFIN;Acc:ZDB-GENE-090406-1]
## 3185 ENSGACG00000008638     groupXIX       9640525                                                                 aminoadipate-semialdehyde synthase [Source:ZFIN;Acc:ZDB-GENE-061220-8]
## 3199 ENSGACG00000008655     groupXIX       9666876                                                               Ca++-dependent secretion activator 2 [Source:ZFIN;Acc:ZDB-GENE-030903-1]
## 3221 ENSGACG00000008687     groupXIX       9749129                                                         ankyrin repeat and SOCS box containing 15a [Source:ZFIN;Acc:ZDB-GENE-110421-5]
## 3255 ENSGACG00000008743     groupXIX       9835581                                                                  protection of telomeres 1 homolog [Source:ZFIN;Acc:ZDB-GENE-110324-1]
## 3280 ENSGACG00000008779     groupXIX       9929496                                                                                                                                       
## 3283 ENSGACG00000008783     groupXIX       9933073                                                                                       zgc:101553 [Source:ZFIN;Acc:ZDB-GENE-041114-124]
## 3311 ENSGACG00000008837     groupXIX       9945600                                                                                                                                       
## 3314 ENSGACG00000008843     groupXIX       9951237                                                    FtsJ RNA methyltransferase homolog 1 (E. coli) [Source:ZFIN;Acc:ZDB-GENE-041114-83]
## 3350 ENSGACG00000008898     groupXIX      10047090                                                                                    ceramide kinase [Source:HGNC Symbol;Acc:HGNC:19256]
## 3359 ENSGACG00000008907     groupXIX      10061760                                                                                si:ch211-286k11.4 [Source:ZFIN;Acc:ZDB-GENE-131121-159]
## 3365 ENSGACG00000008914     groupXIX      10072808                                                                       GRAM domain containing 4b [Source:ZFIN;Acc:ZDB-GENE-030131-4780]
## 3376 ENSGACG00000008928     groupXIX      10081037                                                                                                                                       
## 3392 ENSGACG00000008949     groupXIX      10101541                                             Bet1 golgi vesicular membrane trafficking protein-like [Source:ZFIN;Acc:ZDB-GENE-040822-2]
## 3416 ENSGACG00000008981     groupXIX      10122821                                                                      G-2 and S-phase expressed 1 [Source:ZFIN;Acc:ZDB-GENE-050522-493]
## 3441 ENSGACG00000009014     groupXIX      10134903                                                                                                                                       
## 3442 ENSGACG00000009015     groupXIX      10148833                                                                     cortactin binding protein 2 [Source:ZFIN;Acc:ZDB-GENE-030131-8134]
## 3463 ENSGACG00000009039     groupXIX      10185959 cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-family C, member 7) [Source:ZFIN;Acc:ZDB-GENE-050517-20]
## 3512 ENSGACG00000009107     groupXIX      10282175                                             capping protein (actin filament) muscle Z-line, alpha 2 [Source:HGNC Symbol;Acc:HGNC:1490]
## 3583 ENSGACG00000009201     groupXIX      10353716                                                                                      caveolin 1 [Source:ZFIN;Acc:ZDB-GENE-030131-2415]
## 3584 ENSGACG00000009202     groupXIX      10363935                                                                                       caveolin 2 [Source:ZFIN;Acc:ZDB-GENE-040625-164]
## 3590 ENSGACG00000009210     groupXIX      10390061                                                         testis derived transcript (3 LIM domains) [Source:ZFIN;Acc:ZDB-GENE-040718-59]
## 3656 ENSGACG00000009309     groupXIX      10403743                                                                            centrosomal protein 41 [Source:ZFIN;Acc:ZDB-GENE-040704-35]
## 3674 ENSGACG00000009342     groupXIX      10441422                                                                     dual specificity phosphatase 6 [Source:ZFIN;Acc:ZDB-GENE-030613-1]
## 3703 ENSGACG00000009378     groupXIX      10583153                                            transmembrane and tetratricopeptide repeat containing 3 [Source:ZFIN;Acc:ZDB-GENE-061221-2]
## 3727 ENSGACG00000010003     groupXIX      11486565                           phosphatidylinositol-4-phosphate 3-kinase, catalytic subunit type 2 gamma [Source:HGNC Symbol;Acc:HGNC:8973]
## 3736 ENSGACG00000010014     groupXIX      11517961                                           pleckstrin homology domain containing, family A member 5 [Source:HGNC Symbol;Acc:HGNC:30036]
## 3745 ENSGACG00000010024     groupXIX      11574991                                                                               AE binding protein 2 [Source:HGNC Symbol;Acc:HGNC:24051]
## 3747 ENSGACG00000010026    groupXIII       9810875                                                    ankyrin repeat and death domain containing 1B [Source:ZFIN;Acc:ZDB-GENE-060526-136]
## 3752 ENSGACG00000010032     groupXIX      11627393                                                                phosphodiesterase 3A, cGMP-inhibited [Source:HGNC Symbol;Acc:HGNC:8778]
## 3760 ENSGACG00000010042     groupXIX      11688379                                                              B-cell receptor-associated protein 29 [Source:HGNC Symbol;Acc:HGNC:24131]
## 3768 ENSGACG00000010054     groupXIX      11737187                                                                   HMG-box transcription factor 1 [Source:ZFIN;Acc:ZDB-GENE-050522-414]
## 3774 ENSGACG00000010060     groupXIX      11744274                                           protein kinase, cAMP-dependent, regulatory, type II, beta [Source:HGNC Symbol;Acc:HGNC:9392]
## 3790 ENSGACG00000010863     groupXIX      12871147                                                                 Bloom syndrome, RecQ helicase-like [Source:ZFIN;Acc:ZDB-GENE-070702-5]
## 3814 ENSGACG00000010898     groupXIX      12932396                                                                                   alpha-kinase 3a [Source:ZFIN;Acc:ZDB-GENE-050419-48]
## 3847 ENSGACG00000010936     groupXIX      12952501                                                                         zinc finger protein 592 [Source:ZFIN;Acc:ZDB-GENE-030131-9613]
## 3866 ENSGACG00000010963     groupXIX      12980638                                                                                       zgc:153293 [Source:ZFIN;Acc:ZDB-GENE-060825-315]
## 3868 ENSGACG00000010965     groupXIX      12984613                                                                  RAB19, member RAS oncogene family [Source:HGNC Symbol;Acc:HGNC:19982]
## 3880 ENSGACG00000010978     groupXIX      12985921                                                   cat eye syndrome chromosome region, candidate 5 [Source:ZFIN;Acc:ZDB-GENE-080220-59]
## 3900 ENSGACG00000011004     groupXIX      12995354                                                   Usher syndrome 1C (autosomal recessive, severe) [Source:ZFIN;Acc:ZDB-GENE-060312-41]
## 3929 ENSGACG00000011046     groupXIX      13053541                                                                           MOB kinase activator 2a [Source:ZFIN;Acc:ZDB-GENE-040718-56]
## 3942 ENSGACG00000011062     groupXIX      13172917                              protein tyrosine phosphatase, receptor type, Jb, tandem duplicate 2 [Source:ZFIN;Acc:ZDB-GENE-131120-137]
## 3957 ENSGACG00000011081     groupXIX      13227769                                                                oxysterol binding protein-like 5 [Source:ZFIN;Acc:ZDB-GENE-030131-5872]
## 3997 ENSGACG00000011130     groupXIX      13354738                                                               mitochondrial ribosomal protein L23 [Source:ZFIN;Acc:ZDB-GENE-040625-12]
## 4026 ENSGACG00000011173     groupXIX      13427058                                                                                                                                       
## 4027 ENSGACG00000011175     groupXIX      13430600                                                                  RAB19, member RAS oncogene family [Source:HGNC Symbol;Acc:HGNC:19982]
## 4031 ENSGACG00000011179     groupXIX      13455597                                                                                        im:6904482 [Source:ZFIN;Acc:ZDB-GENE-050506-81]
## 4084 ENSGACG00000011713     groupXIX      14968720                                                                         early endosome antigen 1 [Source:ZFIN;Acc:ZDB-GENE-041111-270]
## 4091 ENSGACG00000011723     groupXIX      14986398                                      nudix (nucleoside diphosphate linked moiety X)-type motif 4a [Source:ZFIN;Acc:ZDB-GENE-031010-33]
## 4092 ENSGACG00000011725     groupXIX      14992460                                                               ubiquitin-conjugating enzyme E2Nb [Source:ZFIN;Acc:ZDB-GENE-040426-1291]
## 4108 ENSGACG00000011745     groupXIX      15000177                                                  nuclear receptor subfamily 1, group H, member 4 [Source:ZFIN;Acc:ZDB-GENE-040718-313]
## 4126 ENSGACG00000011771     groupXIX      15033199                                                                                  si:dkey-103i16.1 [Source:ZFIN;Acc:ZDB-GENE-060503-47]
## 4213 ENSGACG00000011879     groupXIX      15132156                                                                      cation/H+ exchanger protein 2 [Source:ZFIN;Acc:ZDB-GENE-100825-2]
## 4221 ENSGACG00000011888     groupXIX      15143411                                                                         kelch domain containing 10 [Source:HGNC Symbol;Acc:HGNC:22194]
## 4242 ENSGACG00000011914     groupXIX      15189308                                                                cholinergic receptor, muscarinic 2b [Source:ZFIN;Acc:ZDB-GENE-090410-3]
## 4265 ENSGACG00000011944     groupXIX      15214002                                                                                   si:ch211-127m7.3 [Source:ZFIN;Acc:ZDB-GENE-141211-6]
## 4285 ENSGACG00000011975     groupXIX      15276451                                                                          KxDL motif containing 1 [Source:ZFIN;Acc:ZDB-GENE-040801-207]
## 4331 ENSGACG00000012034     groupXIX      15311956                                                               leucine rich repeat containing 17 [Source:ZFIN;Acc:ZDB-GENE-030131-9774]
## 4344 ENSGACG00000012049     groupXIX      15324801                                                                                 si:ch211-236c15.2 [Source:ZFIN;Acc:ZDB-GENE-120709-53]
## 4346 ENSGACG00000012053     groupXIX      15336362                                                               round spermatid basic protein 1-like [Source:HGNC Symbol;Acc:HGNC:24765]
## 4357 ENSGACG00000012066     groupXIX      15348293                                                                           proline rich 5 (renal) [Source:ZFIN;Acc:ZDB-GENE-130530-791]
## 4359 ENSGACG00000012071     groupXIX      15357367                                                                     RAD52 homolog (S. cerevisiae) [Source:ZFIN;Acc:ZDB-GENE-050731-10]
## 4377 ENSGACG00000012099     groupXIX      15387902                                                        ELKS/RAB6-interacting/CAST family member 1a [Source:ZFIN;Acc:ZDB-GENE-091214-5]
## 4383 ENSGACG00000012110     groupXIX      15399820                                                                                                                                       
## 4462 ENSGACG00000012213     groupXIX      15460729                                      nudix (nucleoside diphosphate linked moiety X)-type motif 7 [Source:ZFIN;Acc:ZDB-GENE-131127-212]
## 4467 ENSGACG00000012221     groupXIX      15481309                                                                                                                                       
## 4492 ENSGACG00000012349     groupXIX      15712132                                                                             choline kinase beta [Source:ZFIN;Acc:ZDB-GENE-030131-2928]
## 4521 ENSGACG00000012386     groupIII         50881                                                                  chemokine (C-X-C motif) receptor 4 [Source:HGNC Symbol;Acc:HGNC:2561]
## 4523 ENSGACG00000012388     groupXIX      15752174                                                                        progastricsin (pepsinogen C) [Source:HGNC Symbol;Acc:HGNC:8890]
## 4545 ENSGACG00000012412     groupXIX      15770641                                                                                  arylsulfatase A [Source:ZFIN;Acc:ZDB-GENE-050320-118]
## 4570 ENSGACG00000012441     groupXIX      15776072                                                                                                                                       
## 4584 ENSGACG00000012456     groupXIX      15776176                                                                                                                                       
## 4586 ENSGACG00000012458     groupXIX      15837060                                                       SH3 and multiple ankyrin repeat domains 3a [Source:ZFIN;Acc:ZDB-GENE-060503-369]
## 4597 ENSGACG00000012474     groupXIX      15918028                                                        RAB, member of RAS oncogene family-like 2 [Source:ZFIN;Acc:ZDB-GENE-060503-464]
## 4638 ENSGACG00000012536     groupXIX      15947984                                                                                                                                       
## 4639 ENSGACG00000012538     groupXIX      15950976                                                                                                                                       
## 4642 ENSGACG00000012541     groupXIX      15963171                                            putative pyruvate dehydrogenase phosphatase isoenzyme 2 [Source:ZFIN;Acc:ZDB-GENE-000921-2]
## 4672 ENSGACG00000012580     groupXIX      16058808                                                receptor-interacting serine-threonine kinase 3 like [Source:ZFIN;Acc:ZDB-GENE-071115-4]
## 4693 ENSGACG00000012612     groupXIX      16082531                                                                     RNA binding motif protein 28 [Source:ZFIN;Acc:ZDB-GENE-040426-960]
## 4719 ENSGACG00000012640     groupXIX      16212950                                                                         hepatocyte growth factor b [Source:ZFIN;Acc:ZDB-GENE-041014-3]
## 4804 ENSGACG00000012758     groupXIX      16545546                                                     family with sequence similarity 107, member B [Source:ZFIN;Acc:ZDB-GENE-031030-12]
## 4853 ENSGACG00000012835     groupXIX      16608508                                                                                                                                       
## 4880 ENSGACG00000012874     groupXIX      16640414                                                     protein phosphatase 6, regulatory subunit 2b [Source:ZFIN;Acc:ZDB-GENE-070705-441]
## 4922 ENSGACG00000013517     groupXIX      18023897                                                         CCR4-NOT transcription complex, subunit 2 [Source:ZFIN;Acc:ZDB-GENE-070410-70]
## 4948 ENSGACG00000013552     groupXIX      18279323                                                               tetratricopeptide repeat domain 38 [Source:ZFIN;Acc:ZDB-GENE-050522-318]
## 4958 ENSGACG00000013564     groupXIX      18286218                                                                                 si:ch211-59c24.1 [Source:ZFIN;Acc:ZDB-GENE-060503-607]
## 4991 ENSGACG00000013611     groupXIX      18485565                                                                                                                                       
## 4996 ENSGACG00000013617     groupXIX      18488096                                                     calcium release activated channel regulator 2A [Source:HGNC Symbol;Acc:HGNC:28657]
## 5002 ENSGACG00000013623     groupXIX      18554402                                                                                   si:dkeyp-2c8.2 [Source:ZFIN;Acc:ZDB-GENE-081031-100]
## 5020 ENSGACG00000013648     groupXIX      18713908                                                         FYVE, RhoGEF and PH domain containing 4a [Source:ZFIN;Acc:ZDB-GENE-050420-347]
## 5056 ENSGACG00000013687     groupXIX      18747194                                                                       WEE1 homolog 2 (S. pombe) [Source:ZFIN;Acc:ZDB-GENE-030131-5682]
## 5064 ENSGACG00000013695     groupXIX      18764501                                                ATPase, H+ transporting, lysosomal, V1 subunit E1a [Source:ZFIN;Acc:ZDB-GENE-041212-51]
## 5071 ENSGACG00000013705     groupXIX      18768929                                          solute carrier family 25 (glutamate carrier), member 18 [Source:ZFIN;Acc:ZDB-GENE-041111-192]
## 5077 ENSGACG00000013714     groupXIX      18804888                                                   cat eye syndrome chromosome region, candidate 1a [Source:ZFIN;Acc:ZDB-GENE-030902-4]
## 5119 ENSGACG00000013768     groupXIX      18860234                                                                 WAP four-disulfide core domain 1 [Source:ZFIN;Acc:ZDB-GENE-070112-352]
## 5124 ENSGACG00000013775     groupXIX      18901520                                           potassium voltage-gated channel, subfamily G, member 4a [Source:ZFIN;Acc:ZDB-GENE-050419-11]
## 5128 ENSGACG00000013779     groupXIX      18943143                                                                                 si:dkey-246g23.2 [Source:ZFIN;Acc:ZDB-GENE-050419-100]
## 5131 ENSGACG00000013784     groupXIX      18973913                                                            heat shock factor binding protein 1b [Source:ZFIN;Acc:ZDB-GENE-040426-1721]
## 5135 ENSGACG00000013788     groupXIX      18976351                                                                                      zgc:173742 [Source:ZFIN;Acc:ZDB-GENE-030131-6489]
## 5209 ENSGACG00000013883     groupXIX      19632903                                                                                       zgc:103697 [Source:ZFIN;Acc:ZDB-GENE-040912-104]
## 5217 ENSGACG00000013896     groupXIX      19648021                                                           ring finger and SPRY domain containing 1 [Source:ZFIN;Acc:ZDB-GENE-061026-2]
## 5220 ENSGACG00000013899     groupXIX      19659023                                                  ADP-ribosylation factor-like 2 binding protein [Source:ZFIN;Acc:ZDB-GENE-040426-1604]
## 5240 ENSGACG00000013931     groupXIX      19748754                                             apoptosis-inducing factor, mitochondrion-associated, 2 [Source:HGNC Symbol;Acc:HGNC:21411]
## 5246 ENSGACG00000013939     groupXIX      19756843                                                                            Bardet-Biedl syndrome 2 [Source:ZFIN;Acc:ZDB-GENE-020801-1]
## 5266 ENSGACG00000013963     groupXIX      19768602                                    UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 9 [Source:ZFIN;Acc:ZDB-GENE-060503-611]
## 5292 ENSGACG00000013996     groupXIX      19940088                                                                                                                                       
## 5353 ENSGACG00000014081     groupXIX      20045284                                          transport and golgi organization 6 homolog (Drosophila) [Source:ZFIN;Acc:ZDB-GENE-050419-237]
## 5359 ENSGACG00000014090     groupXIX      20079687                                           mitogen-activated protein kinase 8 interacting protein 1 [Source:ZFIN;Acc:ZDB-GENE-101025-1]
## 6745 ENSGACG00000018690 scaffold_654          2701                                                                                si:ch1073-89b12.1 [Source:ZFIN;Acc:ZDB-GENE-131121-340]
## 6747 ENSGACG00000018692 scaffold_654          7332                                                                            fatty acid desaturase 2 [Source:ZFIN;Acc:ZDB-GENE-011212-1]
## 7089 ENSGACG00000019136      groupIV      21957750                                                                                       plexin C1 [Source:ZFIN;Acc:ZDB-GENE-030131-1620]
## 7984 ENSGACG00000020294     groupVII      15804561                                                       V-set and immunoglobulin domain containing 1 [Source:HGNC Symbol;Acc:HGNC:28675]
## 8497 ENSGACG00000022681     groupXIX       8017901                                                                                  Small nucleolar RNA SNORA19 [Source:RFAM;Acc:RF00413]
##                     Gene_Symbol
## 379                            
## 594                       rnpc3
## 833                        scdb
## 985                       parvb
## 1005                      parvg
## 1014                    plxnb2b
## 1028                    TUBGCP6
## 1042                      appl2
## 1102                    tmem117
## 1109                       TWF1
## 1128                      irak4
## 1138                   ADAMTS20
## 1155                  prickle1a
## 1161            PPHLN1 (1 of 3)
## 1162                           
## 1165                       yaf2
## 1172                    gxylt1b
## 1185             CNTN1 (1 of 2)
## 1193            PNPLA8 (1 of 2)
## 1195                    dnajb9a
## 1197                      dnm1l
## 1254                     cald1b
## 1261                       bpgm
## 1304                     cnot4a
## 1338                      ldhbb
## 1341                    golt1ba
## 1355                    slc35b4
## 1359                    chchd3b
## 1381              NET1 (1 of 2)
## 1384                     asb13b
## 1426                     dhtkd1
## 1463                    camk1db
## 1473             cdk16 (1 of 2)
## 1490                           
## 1518                       ntn4
## 1534                       usp3
## 1559                     man2c1
## 1609                      neil1
## 1612                     commd4
## 1616                     sema7a
## 1627                      islr2
## 1630                      stra6
## 1651                     stoml1
## 1667                       hexa
## 1671                      eif2a
## 1710                      ppcdc
## 1720                      hacd3
## 1739                       vwa9
## 1772                    dennd4a
## 1798                     map2k1
## 1939                     snapc5
## 1946                     smad6b
## 1951                     smad3b
## 1975                      aagab
## 2009                 zgc:162898
## 2015                     pias1b
## 2020                    morf4l1
## 2072                       ddb2
## 2075                     kbtbd4
## 2101                    ighmbp2
## 2127                      chid1
## 2145                      pddc1
## 2155                      cd151
## 2180          si:ch211-247i17.1
## 2182                    cracr2b
## 2187                    tmem138
## 2194                    tmem258
## 2195                       myrf
## 2209 si:ch1073-89b12.1 (1 of 3)
## 2233                      syt7a
## 2243                           
## 2248            si:dkey-201c1.2
## 2263                           
## 2268                      pidd1
## 2288             zmp:0000001167
## 2307                      athl1
## 2345                    rab3il1
## 2361                       hps5
## 2377                    snrnp40
## 2381                     gtf2h1
## 2445                     fibina
## 2473                      clpxb
## 2481                      ap4e1
## 2487                      gnb5a
## 2490                      myo5c
## 2508                     myo5ab
## 2533                    rsl24d1
## 2564             TCF12 (1 of 2)
## 2571             CGNL1 (1 of 2)
## 2588                    adam10b
## 2594                       fan1
## 2613                      sqrdl
## 2628                   ctdspl2b
## 2656                     parp16
## 2678                           
## 2687                     ugt5a1
## 2813                     cox5aa
## 2815                      akip1
## 2857                      psma1
## 2915                  zgc:56106
## 2934                   plekha7a
## 2954                       syt8
## 2962                   tnni2a.1
## 2981                       lsp1
## 3080         si:ch1073-174d20.2
## 3107                      rfwd3
## 3113                      tmed6
## 3117            ZDHHC7 (1 of 2)
## 3132           ankrd27 (1 of 2)
## 3136           ankrd27 (2 of 2)
## 3152        zgc:162267 (2 of 2)
## 3158                      stat3
## 3163                           
## 3168                    ptprz1a
## 3185                       aass
## 3199                     cadps2
## 3221                     asb15a
## 3255                       pot1
## 3280                           
## 3283                 zgc:101553
## 3311                           
## 3314                      ftsj1
## 3350              CERK (1 of 2)
## 3359          si:ch211-286k11.4
## 3365                    gramd4b
## 3376                           
## 3392                      bet1l
## 3416                      gtse1
## 3441                           
## 3442                    cttnbp2
## 3463                       cftr
## 3512                     CAPZA2
## 3583                       cav1
## 3584                       cav2
## 3590                        tes
## 3656                      cep41
## 3674                      dusp6
## 3703                      tmtc3
## 3727                    PIK3C2G
## 3736           PLEKHA5 (1 of 2)
## 3745             AEBP2 (1 of 2)
## 3747                    ankdd1b
## 3752             PDE3A (2 of 2)
## 3760            BCAP29 (2 of 2)
## 3768                       hbp1
## 3774                    PRKAR2B
## 3790                        blm
## 3814                     alpk3a
## 3847                     znf592
## 3866                 zgc:153293
## 3868             RAB19 (1 of 3)
## 3880                      cecr5
## 3900                      ush1c
## 3929                      mob2a
## 3942                   ptprjb.2
## 3957                     osbpl5
## 3997                     mrpl23
## 4026                           
## 4027             RAB19 (2 of 3)
## 4031                 im:6904482
## 4084                       eea1
## 4091                     nudt4a
## 4092                     ube2nb
## 4108                      nr1h4
## 4126           si:dkey-103i16.1
## 4213                       cax2
## 4221           KLHDC10 (1 of 2)
## 4242                     chrm2b
## 4265           si:ch211-127m7.3
## 4285                       kxd1
## 4331            lrrc17 (1 of 2)
## 4344          si:ch211-236c15.2
## 4346                     RSBN1L
## 4357                       prr5
## 4359                      rad52
## 4377                      erc1a
## 4383                           
## 4462                      nudt7
## 4467                           
## 4492                       chkb
## 4521             CXCR4 (2 of 2)
## 4523                        PGC
## 4545                       arsa
## 4570                           
## 4584                           
## 4586                    shank3a
## 4597                      rabl2
## 4638                           
## 4639                           
## 4642                       pdp2
## 4672                     ripk3l
## 4693                      rbm28
## 4719                       hgfb
## 4804                    fam107b
## 4853                           
## 4880                    ppp6r2b
## 4922                      cnot2
## 4948                      ttc38
## 4958           si:ch211-59c24.1
## 4991                           
## 4996           CRACR2A (1 of 2)
## 5002             si:dkeyp-2c8.2
## 5020                      fgd4a
## 5056                       wee2
## 5064                  atp6v1e1a
## 5071                   slc25a18
## 5077                     cecr1a
## 5119                      wfdc1
## 5124                     kcng4a
## 5128           si:dkey-246g23.2
## 5131                     hsbp1b
## 5135                 zgc:173742
## 5209                 zgc:103697
## 5217                     rspry1
## 5220                     arl2bp
## 5240                      AIFM2
## 5246                       bbs2
## 5266                     b3gnt9
## 5292                           
## 5353                     tango6
## 5359                   mapk8ip1
## 6745 si:ch1073-89b12.1 (3 of 3)
## 6747             fads2 (3 of 3)
## 7089                     plxnc1
## 7984                      VSIG1
## 8497                    SNORA19

The paper

We limited differential expression analysis to only those genes represented by at least two reads per million mapped (“copies per million,” CPM) in at least 12 of the 84 libraries (see supplementary fig. S1, Supplementary Material online). We normalized read counts for these 15,847 genes using TMM normalization (Robinson and Oshlack 2010) as implemented by the calcNormFactors function of the R/Bioconductor package edgeR (Robinson et al. 2010). In order to perform gene-wise differential expression analyses in a general linear model framework (Law et al. 2014), we supplied the TMM normalization factors to the voom function of the R/Bioconductor package limma (Ritchie et al. 2015), which generated appropriately weighted log2CPM expression values for all observations. We then fit a linear model for each gene including the fixed effects of factor levels for host population, host family (nested within host population), sex, and microbiota treatment using the limma lmFit function. We did not include a library “batch” effect in the model because initial nMDS ordination did not suggest batch as a major source of transcriptional variation, and our stratified assignment of samples to batches controlled for any confounding effect of batch with respect to other factors of interest. To account for variation between replicate flasks we incorporated flask as a random effect in the model using the limma duplicateCorrelation function. Each hypothesis of interest was tested, for each gene, using one or more contrasts via moderated t-tests applied by the limma function eBayes. To evaluate the effect of our microbiota treatment we performed a within-OC contrast, a within-FW contrast, and an overall contrast. Genes expressed differentially in any of these three contrasts were interpreted as being associated with the presence of microbes. We performed a single contrast to test for an overall effect of host population, and a single contrast to test for an interaction between host population and microbiota, both of these accounting for family differences nested within population. Finally, we performed contrasts to test for an effect of sex and a sex-by-microbiota interaction. For each of these seven contrasts, we controlled the false discovery rate (FDR) at 0.1 using the approach of Benjamini and Hochberg (1995), as implemented by the limma topTable function.

Another look at error rates

Suppose that I got a positive result on an HIV test. What’s the chance I am HIV positive? (Here we really mean that “I” am a randomly chosen person from the US population.)

Background data

OraQUICK advance rapid test: 99.4% specificity and 99.8% sensitivity (from MLO online).

Refreshing from Wikipedia, specificity is the “true positive” rate and the sensitivity is the “true negative” rate:

if you have HIV, the chance that it (wrongly) comes out negative is .006 = 0.6%;
if you don’t have HIV, the chance that it (wrongly) comes out positive is .002 = 0.2%.

There are currently around 1.1 million people with HIV in the US, out of a total of 328 million, giving an overall rate of 0.00335 = 0.335%.

true_pos <- .994
true_neg <- .998
pop_rate <- 1.1 / 328

Probabilities

We want to know \[\begin{aligned} & \P\{ \text{HIV+} | \text{postive test} \} \\ & \qquad = \frac{\P\{ \text{HIV+ and getting a positive test} \} }{ \P\{ \text{getting a positive test} \} } \end{aligned}\]

Making that concrete with simulation

Start with a large sample from the US population, some of whom have HIV and others do not, and then give them all HIV tests.

N <- 1e6
people <- data.frame(
    hiv = runif(N) < pop_rate)
people$test <- NA
people$test[people$hiv] <- ifelse(runif(sum(people$hiv)) < true_pos, "+", "-")
people$test[!people$hiv] <- ifelse(runif(sum(!people$hiv)) < true_neg, "-", "+")
addmargins(table(status=people$hiv, test=people$test))

##        test
## status        -       +     Sum
##   FALSE  994744    2003  996747
##   TRUE       19    3234    3253
##   Sum    994763    5237 1000000

addmargins(table(status=people$hiv, test=people$test))

##        test
## status        -       +     Sum
##   FALSE  994744    2003  996747
##   TRUE       19    3234    3253
##   Sum    994763    5237 1000000

For example, there are 2003 who do not have HIV but got a positive test result.

What is the proportion of people who got a positive test result who actually have HIV?

Now let’s compute: the probability we want is the proportion of people who got a positive test result who actually have HIV:

hiv_given_plus <- sum(people$hiv & (people$test == "+")) / sum(people$test == "+")

The proportion of the 5237 in this sample of 10⁶ that had a positive test result that actually have HIV is 3234/5237 = 61.75%.

Conditional probabilities

We want to compute the theoretical probability of having HIV given a positive test result, or \[ P(\text{HIV} \;|\; + ) = \frac{P(\text{HIV and}\; +)}{P(+)} \]

The probability that a randomly chosen person from the population has HIV and got a positive test on this test is \[\begin{aligned} P(\text{HIV}\; \text{and}\; +) &= P(\text{HIV}) \times P(+ \;|\; \text{HIV}) \\ &= 0.00335 \times 0.994 \\ &= 0.0033299 \end{aligned}\]

We’ll also need the complementary probability, \[\begin{aligned} P(\text{not HIV}\; \text{and}\; +) &= P(\text{not HIV}) \times P(+ \;|\; \text{not HIV}) \\ &= (1 - 0.00335) \times (1 - 0.998) \\ &= 0.0019933 \end{aligned}\]

And, the probability that a randomly chosen person from the population has a positive test result is \[\begin{aligned} P(+) &= P(\text{HIV and}\; +) + P(\text{not HIV and}\; +) \\ &= 0.0033299 + 0.0019933 \\ &= 0.0053232. \end{aligned}\]

Putting these together, we get that \[\begin{aligned} P(\text{HIV} \;|\; + ) &= \frac{0.0033299}{0.0053232} \\ &= 0.6255448. \end{aligned}\] In other words, we get a predicted probability of 62.5%.