Peter Ralph
Advanced Biological Statistics
Suppose 100 people did 100 well-executed experiments to ask if snails move faster while listening to metal than to mozart.
How many would find a statistically significant difference at \(p < 0.05\)?
Would any find a large effect size?
Suppose someone conducts a well-controlled study that records coronavirus infection rates and the mean daily consumption of 100 different foods in a bunch of people.
How many of the foods would be statistically significantly associated with lower infection rates at \(p < 0.05\)?
Would any have a large effect size?
A \(p\)-value is
the probability of seeing something at least as extreme as what was seen in the data, if the null hypothesis were true.
So, if the null hypothesis is true, then by definition, \(p\)-values are uniformly distributed between 0 and 1,
and so \(\P\{ p < 0.05 \} = 0.05\).
A cutoff of \(p < 0.05\) ensures you should not wrongly reject the null hypothesis more than 5% of the time.
But, if you do \(n\) different tests, all at once?
To keep the probability of not wrongly rejecting any of the \(n\) null hypotheses to 5%, take a cutoff of \(p < 0.05/n\).
To tolerate some errors, control the false discovery rate.
Suppose you test 1,000 different songs to see if they make snails go faster, obtain a \(p\)-value for each song, set a threshold \(p_0\), and study further those with \(p < p_0\).
We expect* no false positives if \(p_0 = 0.05 / 1000 = 0.00005\) (Bonferroni). (\(*\)at the 5% level)
If \(p_0\) is set to have a 5% false discovery rate,
and 100 songs fall below the threshold,
then we expect about 5 of these to be false positives.
> help(p.adjust)
p.adjust package:stats R Documentation
Adjust P-values for Multiple Comparisons
Description:
Given a set of p-values, returns p-values adjusted using one of
several methods.
Usage:
p.adjust(p, method = p.adjust.methods, n = length(p))
p.adjust.methods
# c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
# "fdr", "none")
Arguments:
p: numeric vector of p-values (possibly with ‘NA’s). Any other
R object is coerced by ‘as.numeric’.
method: correction method, a ‘character’ string. Can be abbreviated.
n: number of comparisons, must be at least ‘length(p)’; only set
this (to non-default) when you know what you are doing!
Details:
The adjustment methods include the Bonferroni correction
(‘"bonferroni"’) in which the p-values are multiplied by the
number of comparisons. Less conservative corrections are also
included by Holm (1979) (‘"holm"’), Hochberg (1988)
(‘"hochberg"’), Hommel (1988) (‘"hommel"’), Benjamini & Hochberg
(1995) (‘"BH"’ or its alias ‘"fdr"’), and Benjamini & Yekutieli
(2001) (‘"BY"’), respectively. A pass-through option (‘"none"’)
is also included. The set of methods are contained in the
‘p.adjust.methods’ vector for the benefit of methods that need to
have the method as an option and pass it on to ‘p.adjust’.
The first four methods are designed to give strong control of the
family-wise error rate. There seems no reason to use the
unmodified Bonferroni correction because it is dominated by Holm's
method, which is also valid under arbitrary assumptions.
Hochberg's and Hommel's methods are valid when the hypothesis
tests are independent or when they are non-negatively associated
(Sarkar, 1998; Sarkar and Chang, 1997). Hommel's method is more
powerful than Hochberg's, but the difference is usually small and
the Hochberg p-values are faster to compute.
The ‘"BH"’ (aka ‘"fdr"’) and ‘"BY"’ methods of Benjamini,
Hochberg, and Yekutieli control the false discovery rate, the
expected proportion of false discoveries amongst the rejected
hypotheses. The false discovery rate is a less stringent
condition than the family-wise error rate, so these methods are
more powerful than the others.
Note that you can set ‘n’ larger than ‘length(p)’ which means the
unobserved p-values are assumed to be greater than all the
observed p for ‘"bonferroni"’ and ‘"holm"’ methods and equal to 1
for the other methods.
Value:
A numeric vector of corrected p-values (of the same length as ‘p’,
with names copied from ‘p’).
References:
Benjamini, Y., and Hochberg, Y. (1995). Controlling the false
discovery rate: a practical and powerful approach to multiple
testing. _Journal of the Royal Statistical Society Series B_,
*57*, 289-300. <URL: http://www.jstor.org/stable/2346101>.
Benjamini, Y., and Yekutieli, D. (2001). The control of the false
discovery rate in multiple testing under dependency. _Annals of
Statistics_, *29*, 1165-1188. doi: 10.1214/aos/1013699998 (URL:
https://doi.org/10.1214/aos/1013699998).
Holm, S. (1979). A simple sequentially rejective multiple test
procedure. _Scandinavian Journal of Statistics_, *6*, 65-70.
<URL: http://www.jstor.org/stable/4615733>.
Hommel, G. (1988). A stagewise rejective multiple test procedure
based on a modified Bonferroni test. _Biometrika_, *75*, 383-386.
doi: 10.2307/2336190 (URL: https://doi.org/10.2307/2336190).
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple
tests of significance. _Biometrika_, *75*, 800-803. doi:
10.2307/2336325 (URL: https://doi.org/10.2307/2336325).
Shaffer, J. P. (1995). Multiple hypothesis testing. _Annual
Review of Psychology_, *46*, 561-584. doi:
10.1146/annurev.ps.46.020195.003021 (URL:
https://doi.org/10.1146/annurev.ps.46.020195.003021). (An
excellent review of the area.)
Sarkar, S. (1998). Some probability inequalities for ordered MTP2
random variables: a proof of Simes conjecture. _Annals of
Statistics_, *26*, 494-504. doi: 10.1214/aos/1028144846 (URL:
https://doi.org/10.1214/aos/1028144846).
Sarkar, S., and Chang, C. K. (1997). The Simes method for
multiple hypothesis testing with positively dependent test
statistics. _Journal of the American Statistical Association_,
*92*, 1601-1608. doi: 10.2307/2965431 (URL:
https://doi.org/10.2307/2965431).
Wright, S. P. (1992). Adjusted P-values for simultaneous
inference. _Biometrics_, *48*, 1005-1013. doi: 10.2307/2532694
(URL: https://doi.org/10.2307/2532694). (Explains the adjusted
P-value approach.)
See Also:
‘pairwise.*’ functions such as ‘pairwise.t.test’.
Examples:
require(graphics)
set.seed(123)
x <- rnorm(50, mean = c(rep(0, 25), rep(3, 25)))
p <- 2*pnorm(sort(-abs(x)))
round(p, 3)
round(p.adjust(p), 3)
round(p.adjust(p, "BH"), 3)
## or all of them at once (dropping the "fdr" alias):
p.adjust.M <- p.adjust.methods[p.adjust.methods != "fdr"]
p.adj <- sapply(p.adjust.M, function(meth) p.adjust(p, meth))
p.adj.60 <- sapply(p.adjust.M, function(meth) p.adjust(p, meth, n = 60))
stopifnot(identical(p.adj[,"none"], p), p.adj <= p.adj.60)
round(p.adj, 3)
## or a bit nicer:
noquote(apply(p.adj, 2, format.pval, digits = 3))
## and a graphic:
matplot(p, p.adj, ylab="p.adjust(p, meth)", type = "l", asp = 1, lty = 1:6,
main = "P-value adjustments")
legend(0.7, 0.6, p.adjust.M, col = 1:6, lty = 1:6)
## Can work with NA's:
pN <- p; iN <- c(46, 47); pN[iN] <- NA
pN.a <- sapply(p.adjust.M, function(meth) p.adjust(pN, meth))
## The smallest 20 P-values all affected by the NA's :
round((pN.a / p.adj)[1:20, ] , 4)
Modify the code so that 20 of the datasets have a mean of \(\mu=1\) (not zero, like below). See how many of the \(p\)-values are below 0.05
tp <- replicate(1000, t.test(rnorm(20))$p.value)
sprintf("%d of the %d p-values are below 0.05.", sum(tp < 0.05), length(tp))
layout(t(1:2))
hist(tp, breaks=40, xlab='p-value')
plot(sort(tp), xlim=c(1,100), ylim=c(0, 0.1), ylab='p-values, sorted')
abline(h=c(0.05, 0.05/length(tp)), col=1:2)
legend("topright", lty=1, col=1:2, legend=paste("p=", c(0.05, 0.05/length(tp))))
fish <- read.table("../Datasets/stickleback_GFvsCV_RNAseq/CVvsGF_RNAseq_Metadata.tsv", header=TRUE, sep='\t')
tmp <- read.table("../Datasets/stickleback_GFvsCV_RNAseq/CVvsGF_RNAseq_CPM.tsv", header=TRUE, sep='\t', stringsAsFactors=FALSE, check.names=FALSE)
genes <- tmp[,1:5]
expression <- as.matrix(tmp[,6:ncol(tmp)])
# consistency check
stopifnot(all(match(colnames(expression), fish$Individual) == 1:nrow(fish)))
There are 8506 genes whose expression is measured in 84 fish.
To put coefficients on the same scale:
Gene expression varies across many orders of magnitude:
Fit lots of models:
pop_lms <- apply(expr, 1, function (x) (lm(x ~ Population, data=fish)))
all_lms <- apply(expr, 1, function (x) (lm(x ~ Population + Treatment + Sex, data=fish)))
anovas <- mapply(anova, pop_lms, all_lms, SIMPLIFY=FALSE)
and extract coefficients, \(p\)-values
… for an ANOVA comparing
gene expression ~ Population
gene expression ~ Population + Treatment + Sex
1158 \(p\)-values are less than 0.05.
But, out of 8506, we’d expect about 425 to be less than 0.05.
Let’s look at the three coefficients for all the models:
x ~ Population + Treatment + Sex
Coefficents, with \(p < 0.05\) in red:
Coefficents, with \(p < 0.05/n\) in red (Bonferroni!):
## Gene_ID Genome_Loc Gene_Start_bp Gene_Description
## 379 ENSGACG00000001729 scaffold_882 1249 *interleukin-8 [Lateolabrax japonicus];ABI48894 [Source:TopBlasxHit]
## 594 ENSGACG00000002048 groupXXI 2362051 RNA-binding region (RNP1, RRM) containing 3 [Source:ZFIN;Acc:ZDB-GENE-060312-35]
## 833 ENSGACG00000002397 groupV 842100 stearoyl-CoA desaturase b [Source:ZFIN;Acc:ZDB-GENE-050522-12]
## 985 ENSGACG00000003874 groupXIX 5074262 parvin, beta [Source:ZFIN;Acc:ZDB-GENE-030131-4411]
## 1005 ENSGACG00000003899 groupXIX 5087963 parvin, gamma [Source:ZFIN;Acc:ZDB-GENE-070410-67]
## 1014 ENSGACG00000003911 groupXIX 5234537 plexin b2b [Source:ZFIN;Acc:ZDB-GENE-080902-1]
## 1028 ENSGACG00000003928 groupXIX 5268421 tubulin, gamma complex associated protein 6 [Source:HGNC Symbol;Acc:HGNC:18127]
## 1042 ENSGACG00000003947 groupXIX 5289965 adaptor protein, phosphotyrosine interaction, PH domain and leucine zipper containing 2 [Source:ZFIN;Acc:ZDB-GENE-081016-2]
## 1102 ENSGACG00000004020 groupXIX 5466205 transmembrane protein 117 [Source:ZFIN;Acc:ZDB-GENE-040426-2809]
## 1109 ENSGACG00000004028 groupXIX 5497683 twinfilin actin-binding protein 1 [Source:HGNC Symbol;Acc:HGNC:9620]
## 1128 ENSGACG00000004050 groupXIX 5505366 interleukin-1 receptor-associated kinase 4 [Source:ZFIN;Acc:ZDB-GENE-040426-738]
## 1138 ENSGACG00000004063 groupXIX 5536933 ADAM metallopeptidase with thrombospondin type 1 motif, 20 [Source:HGNC Symbol;Acc:HGNC:17178]
## 1155 ENSGACG00000004088 groupXIX 5663914 prickle homolog 1a (Drosophila) [Source:ZFIN;Acc:ZDB-GENE-030724-5]
## 1161 ENSGACG00000004098 groupXIX 5670651 periphilin 1 [Source:HGNC Symbol;Acc:HGNC:19369]
## 1162 ENSGACG00000004099 groupXIX 5675242
## 1165 ENSGACG00000004103 groupXIX 5687641 YY1 associated factor 2 [Source:ZFIN;Acc:ZDB-GENE-041210-115]
## 1172 ENSGACG00000004112 groupXIX 5691367 glucoside xylosyltransferase 1b [Source:ZFIN;Acc:ZDB-GENE-041210-116]
## 1185 ENSGACG00000004129 groupXIX 5819632 contactin 1 [Source:HGNC Symbol;Acc:HGNC:2171]
## 1193 ENSGACG00000004140 groupXIX 5932949 patatin-like phospholipase domain containing 8 [Source:HGNC Symbol;Acc:HGNC:28900]
## 1195 ENSGACG00000004142 groupXIX 5949691 DnaJ (Hsp40) homolog, subfamily B, member 9a [Source:ZFIN;Acc:ZDB-GENE-050626-115]
## 1197 ENSGACG00000004145 groupXIX 5951392 dynamin 1-like [Source:ZFIN;Acc:ZDB-GENE-040426-1556]
## 1254 ENSGACG00000004208 groupXIX 5968125 caldesmon 1b [Source:ZFIN;Acc:ZDB-GENE-090313-229]
## 1261 ENSGACG00000004215 groupXIX 5984689 2,3-bisphosphoglycerate mutase [Source:ZFIN;Acc:ZDB-GENE-040718-375]
## 1304 ENSGACG00000004276 groupXIX 6011659 CCR4-NOT transcription complex, subunit 4a [Source:ZFIN;Acc:ZDB-GENE-090313-262]
## 1338 ENSGACG00000004329 groupXIX 6085324 lactate dehydrogenase Bb [Source:ZFIN;Acc:ZDB-GENE-040718-176]
## 1341 ENSGACG00000004333 groupXIX 6090639 golgi transport 1Ba [Source:ZFIN;Acc:ZDB-GENE-041210-157]
## 1355 ENSGACG00000004350 groupXIX 6093858 solute carrier family 35, member B4 [Source:ZFIN;Acc:ZDB-GENE-030131-2457]
## 1359 ENSGACG00000004357 groupXIX 6098400 coiled-coil-helix-coiled-coil-helix domain containing 3b [Source:ZFIN;Acc:ZDB-GENE-030131-4476]
## 1381 ENSGACG00000004380 groupXIX 6111536 neuroepithelial cell transforming 1 [Source:HGNC Symbol;Acc:HGNC:14592]
## 1384 ENSGACG00000004384 groupXIX 6122297 ankyrin repeat and SOCS box containing 13b [Source:ZFIN;Acc:ZDB-GENE-091118-116]
## 1426 ENSGACG00000004439 groupXIX 6183509 dehydrogenase E1 and transketolase domain containing 1 [Source:ZFIN;Acc:ZDB-GENE-041212-44]
## 1463 ENSGACG00000004485 groupXIX 6201210 calcium/calmodulin-dependent protein kinase 1Db [Source:ZFIN;Acc:ZDB-GENE-070112-1872]
## 1473 ENSGACG00000004498 groupXIX 6213441 cyclin-dependent kinase 16 [Source:ZFIN;Acc:ZDB-GENE-030131-2939]
## 1490 ENSGACG00000004521 groupXIX 6234660
## 1518 ENSGACG00000004555 groupXIX 6253382 netrin 4 [Source:ZFIN;Acc:ZDB-GENE-050310-1]
## 1534 ENSGACG00000004578 groupXIX 6269682 ubiquitin specific peptidase 3 [Source:ZFIN;Acc:ZDB-GENE-030131-5142]
## 1559 ENSGACG00000004613 groupXIX 6294749 mannosidase, alpha, class 2C, member 1 [Source:ZFIN;Acc:ZDB-GENE-101103-4]
## 1609 ENSGACG00000004670 groupXIX 6301852 nei endonuclease VIII-like 1 (E. coli) [Source:ZFIN;Acc:ZDB-GENE-040426-994]
## 1612 ENSGACG00000004675 groupXIX 6304507 COMM domain containing 4 [Source:ZFIN;Acc:ZDB-GENE-060929-600]
## 1616 ENSGACG00000004680 groupXIX 6310208 semaphorin 7A [Source:ZFIN;Acc:ZDB-GENE-030131-3633]
## 1627 ENSGACG00000004691 groupXIX 6319533 immunoglobulin superfamily containing leucine-rich repeat 2 [Source:ZFIN;Acc:ZDB-GENE-050320-95]
## 1630 ENSGACG00000004695 groupXIX 6323877 stimulated by retinoic acid 6 [Source:ZFIN;Acc:ZDB-GENE-060616-252]
## 1651 ENSGACG00000004724 groupXIX 6381558 stomatin (EPB72)-like 1 [Source:ZFIN;Acc:ZDB-GENE-070209-241]
## 1667 ENSGACG00000004744 groupXIX 6392993 hexosaminidase A (alpha polypeptide) [Source:ZFIN;Acc:ZDB-GENE-050417-283]
## 1671 ENSGACG00000004748 groupI 474993 eukaryotic translation initiation factor 2A [Source:ZFIN;Acc:ZDB-GENE-050626-52]
## 1710 ENSGACG00000004795 groupXIX 6430393 phosphopantothenoylcysteine decarboxylase [Source:ZFIN;Acc:ZDB-GENE-040426-1749]
## 1720 ENSGACG00000004806 groupXIX 6434587 3-hydroxyacyl-CoA dehydratase 3 [Source:ZFIN;Acc:ZDB-GENE-040426-1200]
## 1739 ENSGACG00000004827 groupXIX 6437512 von Willebrand factor A domain containing 9 [Source:ZFIN;Acc:ZDB-GENE-030131-5804]
## 1772 ENSGACG00000004867 groupXIX 6453754 DENN/MADD domain containing 4A [Source:ZFIN;Acc:ZDB-GENE-060503-285]
## 1798 ENSGACG00000004903 groupXIX 6544984 mitogen-activated protein kinase kinase 1 [Source:ZFIN;Acc:ZDB-GENE-040426-2759]
## 1939 ENSGACG00000005078 groupXIX 6569082 small nuclear RNA activating complex, polypeptide 5 [Source:ZFIN;Acc:ZDB-GENE-041111-50]
## 1946 ENSGACG00000005085 groupXIX 6609904 SMAD family member 6b [Source:ZFIN;Acc:ZDB-GENE-050419-198]
## 1951 ENSGACG00000005092 groupXIX 6644849 SMAD family member 3b [Source:ZFIN;Acc:ZDB-GENE-030128-4]
## 1975 ENSGACG00000005119 groupXIX 6652728 alpha- and gamma-adaptin binding protein [Source:ZFIN;Acc:ZDB-GENE-040718-120]
## 2009 ENSGACG00000005167 groupXIX 6676365 zgc:162898 [Source:ZFIN;Acc:ZDB-GENE-070410-107]
## 2015 ENSGACG00000005176 groupXIX 6715666 protein inhibitor of activated STAT, 1b [Source:ZFIN;Acc:ZDB-GENE-050419-202]
## 2020 ENSGACG00000005181 groupXIX 6726200 mortality factor 4 like 1 [Source:ZFIN;Acc:ZDB-GENE-040718-348]
## 2072 ENSGACG00000005252 groupXIX 6844584 damage-specific DNA binding protein 2 [Source:ZFIN;Acc:ZDB-GENE-050419-169]
## 2075 ENSGACG00000005255 groupXIX 6848563 kelch repeat and BTB (POZ) domain containing 4 [Source:ZFIN;Acc:ZDB-GENE-040426-937]
## 2101 ENSGACG00000005293 groupXIX 6862001 immunoglobulin mu binding protein 2 [Source:ZFIN;Acc:ZDB-GENE-050419-258]
## 2127 ENSGACG00000005331 groupXIX 6868449 chitinase domain containing 1 [Source:ZFIN;Acc:ZDB-GENE-030131-9169]
## 2145 ENSGACG00000005355 groupXIX 6872104 Parkinson disease 7 domain containing 1 [Source:ZFIN;Acc:ZDB-GENE-051030-96]
## 2155 ENSGACG00000005365 groupXIX 6873765 CD151 molecule [Source:ZFIN;Acc:ZDB-GENE-041010-137]
## 2180 ENSGACG00000005397 groupXIX 6878614 si:ch211-247i17.1 [Source:ZFIN;Acc:ZDB-GENE-131121-275]
## 2182 ENSGACG00000005399 groupXIX 6880605 calcium release activated channel regulator 2B [Source:ZFIN;Acc:ZDB-GENE-061215-136]
## 2187 ENSGACG00000005406 groupXIX 6886885 transmembrane protein 138 [Source:ZFIN;Acc:ZDB-GENE-120912-1]
## 2194 ENSGACG00000005414 groupXIX 6889077 transmembrane protein 258 [Source:ZFIN;Acc:ZDB-GENE-040426-1739]
## 2195 ENSGACG00000005416 groupXIX 6895042 myelin regulatory factor [Source:ZFIN;Acc:ZDB-GENE-080204-57]
## 2209 ENSGACG00000005436 groupXIX 6921327 si:ch1073-89b12.1 [Source:ZFIN;Acc:ZDB-GENE-131121-340]
## 2233 ENSGACG00000005468 groupXIX 6952818 synaptotagmin VIIa [Source:ZFIN;Acc:ZDB-GENE-090601-5]
## 2243 ENSGACG00000005483 groupXIX 7010663
## 2248 ENSGACG00000005489 groupXIX 7013605 si:dkey-201c1.2 [Source:ZFIN;Acc:ZDB-GENE-110408-61]
## 2263 ENSGACG00000005509 groupXIX 7017971
## 2268 ENSGACG00000005514 groupXIX 7023483 p53-induced death domain protein 1 [Source:ZFIN;Acc:ZDB-GENE-081104-353]
## 2288 ENSGACG00000005541 groupXIX 7040298 zmp:0000001167 [Source:ZFIN;Acc:ZDB-GENE-140106-127]
## 2307 ENSGACG00000005561 groupXIX 7055587 ATH1, acid trehalase-like 1 (yeast) [Source:ZFIN;Acc:ZDB-GENE-061103-319]
## 2345 ENSGACG00000005613 groupXIX 7078941 RAB3A interacting protein (rabin3)-like 1 [Source:ZFIN;Acc:ZDB-GENE-110921-5]
## 2361 ENSGACG00000005632 groupXIX 7091066 Hermansky-Pudlak syndrome 5 [Source:ZFIN;Acc:ZDB-GENE-070410-80]
## 2377 ENSGACG00000005655 groupX 8128412 small nuclear ribonucleoprotein 40 (U5) [Source:ZFIN;Acc:ZDB-GENE-040426-978]
## 2381 ENSGACG00000005659 groupXIX 7099791 general transcription factor IIH, polypeptide 1 [Source:ZFIN;Acc:ZDB-GENE-040912-164]
## 2445 ENSGACG00000005940 groupXIX 7354155 fin bud initiation factor a [Source:ZFIN;Acc:ZDB-GENE-111031-2]
## 2473 ENSGACG00000005974 groupXIX 7374591 caseinolytic mitochondrial matrix peptidase chaperone subunit b [Source:ZFIN;Acc:ZDB-GENE-130404-1]
## 2481 ENSGACG00000005988 groupXIX 7395960 adaptor-related protein complex 4, epsilon 1 subunit [Source:ZFIN;Acc:ZDB-GENE-061221-3]
## 2487 ENSGACG00000005996 groupXIX 7404582 guanine nucleotide binding protein (G protein), beta 5a [Source:ZFIN;Acc:ZDB-GENE-070112-342]
## 2490 ENSGACG00000006001 groupXIX 7409406 myosin VC [Source:ZFIN;Acc:ZDB-GENE-131127-196]
## 2508 ENSGACG00000006025 groupXIX 7420099 myosin VAb [Source:ZFIN;Acc:ZDB-GENE-050411-72]
## 2533 ENSGACG00000006058 groupXIX 7436490 ribosomal L24 domain containing 1 [Source:ZFIN;Acc:ZDB-GENE-040426-1925]
## 2564 ENSGACG00000006101 groupXIX 7477087 transcription factor 12 [Source:HGNC Symbol;Acc:HGNC:11623]
## 2571 ENSGACG00000006110 groupXIX 7488251 cingulin-like 1 [Source:HGNC Symbol;Acc:HGNC:25931]
## 2588 ENSGACG00000006135 groupXIX 7504978 ADAM metallopeptidase domain 10b [Source:ZFIN;Acc:ZDB-GENE-071115-1]
## 2594 ENSGACG00000006141 groupXIX 7515235 FANCD2/FANCI-associated nuclease 1 [Source:ZFIN;Acc:ZDB-GENE-030131-6225]
## 2613 ENSGACG00000006260 groupXIX 7626845 sulfide quinone reductase-like (yeast) [Source:ZFIN;Acc:ZDB-GENE-050417-436]
## 2628 ENSGACG00000006281 groupXIX 7634611 CTD (carboxy-terminal domain, RNA polymerase II, polypeptide A) small phosphatase like 2b [Source:ZFIN;Acc:ZDB-GENE-030131-1809]
## 2656 ENSGACG00000006315 groupXIX 7644684 poly (ADP-ribose) polymerase family, member 16 [Source:ZFIN;Acc:ZDB-GENE-040426-2289]
## 2678 ENSGACG00000006340 groupXIX 7654842
## 2687 ENSGACG00000006351 groupXIX 7669052 UDP glucuronosyltransferase 5 family, polypeptide A1 [Source:ZFIN;Acc:ZDB-GENE-051120-60]
## 2813 ENSGACG00000006516 groupXIX 7716217 cytochrome c oxidase subunit Vaa [Source:ZFIN;Acc:ZDB-GENE-050522-133]
## 2815 ENSGACG00000006521 groupXIX 7718951 A kinase (PRKA) interacting protein 1 [Source:ZFIN;Acc:ZDB-GENE-030829-24]
## 2857 ENSGACG00000006582 groupXIX 7755202 proteasome (prosome, macropain) subunit, alpha type, 1 [Source:ZFIN;Acc:ZDB-GENE-040801-15]
## 2915 ENSGACG00000006659 groupXIX 7928238 zgc:56106 [Source:ZFIN;Acc:ZDB-GENE-040426-904]
## 2934 ENSGACG00000006687 groupXIX 7934480 pleckstrin homology domain containing, family A member 7a [Source:ZFIN;Acc:ZDB-GENE-050419-75]
## 2954 ENSGACG00000008302 groupXIX 9271720 synaptotagmin VIII [Source:ZFIN;Acc:ZDB-GENE-060303-4]
## 2962 ENSGACG00000008315 groupXIX 9287888 troponin I type 2a (skeletal, fast), tandem duplicate 1 [Source:ZFIN;Acc:ZDB-GENE-041114-60]
## 2981 ENSGACG00000008376 groupXIX 9337694 lymphocyte-specific protein 1 [Source:ZFIN;Acc:ZDB-GENE-131127-171]
## 3080 ENSGACG00000008494 groupXIX 9448469 si:ch1073-174d20.2 [Source:ZFIN;Acc:ZDB-GENE-121214-51]
## 3107 ENSGACG00000008531 groupXIX 9501117 ring finger and WD repeat domain 3 [Source:ZFIN;Acc:ZDB-GENE-120529-1]
## 3113 ENSGACG00000008543 groupXIX 9517063 transmembrane emp24 protein transport domain containing 6 [Source:ZFIN;Acc:ZDB-GENE-131121-182]
## 3117 ENSGACG00000008550 groupXIX 9523896 zinc finger, DHHC-type containing 7 [Source:HGNC Symbol;Acc:HGNC:18459]
## 3132 ENSGACG00000008572 groupXIX 9532775 ankyrin repeat domain 27 (VPS9 domain) [Source:ZFIN;Acc:ZDB-GENE-121105-1]
## 3136 ENSGACG00000008577 groupXIX 9535480 ankyrin repeat domain 27 (VPS9 domain) [Source:ZFIN;Acc:ZDB-GENE-121105-1]
## 3152 ENSGACG00000008599 groupXIX 9564658 zgc:162267 [Source:ZFIN;Acc:ZDB-GENE-070410-53]
## 3158 ENSGACG00000008607 groupXI 5736635 signal transducer and activator of transcription 3 (acute-phase response factor) [Source:ZFIN;Acc:ZDB-GENE-980526-68]
## 3163 ENSGACG00000008612 groupXIX 9579090
## 3168 ENSGACG00000008617 groupXIX 9607677 protein tyrosine phosphatase, receptor-type, Z polypeptide 1a [Source:ZFIN;Acc:ZDB-GENE-090406-1]
## 3185 ENSGACG00000008638 groupXIX 9640525 aminoadipate-semialdehyde synthase [Source:ZFIN;Acc:ZDB-GENE-061220-8]
## 3199 ENSGACG00000008655 groupXIX 9666876 Ca++-dependent secretion activator 2 [Source:ZFIN;Acc:ZDB-GENE-030903-1]
## 3221 ENSGACG00000008687 groupXIX 9749129 ankyrin repeat and SOCS box containing 15a [Source:ZFIN;Acc:ZDB-GENE-110421-5]
## 3255 ENSGACG00000008743 groupXIX 9835581 protection of telomeres 1 homolog [Source:ZFIN;Acc:ZDB-GENE-110324-1]
## 3280 ENSGACG00000008779 groupXIX 9929496
## 3283 ENSGACG00000008783 groupXIX 9933073 zgc:101553 [Source:ZFIN;Acc:ZDB-GENE-041114-124]
## 3311 ENSGACG00000008837 groupXIX 9945600
## 3314 ENSGACG00000008843 groupXIX 9951237 FtsJ RNA methyltransferase homolog 1 (E. coli) [Source:ZFIN;Acc:ZDB-GENE-041114-83]
## 3350 ENSGACG00000008898 groupXIX 10047090 ceramide kinase [Source:HGNC Symbol;Acc:HGNC:19256]
## 3359 ENSGACG00000008907 groupXIX 10061760 si:ch211-286k11.4 [Source:ZFIN;Acc:ZDB-GENE-131121-159]
## 3365 ENSGACG00000008914 groupXIX 10072808 GRAM domain containing 4b [Source:ZFIN;Acc:ZDB-GENE-030131-4780]
## 3376 ENSGACG00000008928 groupXIX 10081037
## 3392 ENSGACG00000008949 groupXIX 10101541 Bet1 golgi vesicular membrane trafficking protein-like [Source:ZFIN;Acc:ZDB-GENE-040822-2]
## 3416 ENSGACG00000008981 groupXIX 10122821 G-2 and S-phase expressed 1 [Source:ZFIN;Acc:ZDB-GENE-050522-493]
## 3441 ENSGACG00000009014 groupXIX 10134903
## 3442 ENSGACG00000009015 groupXIX 10148833 cortactin binding protein 2 [Source:ZFIN;Acc:ZDB-GENE-030131-8134]
## 3463 ENSGACG00000009039 groupXIX 10185959 cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-family C, member 7) [Source:ZFIN;Acc:ZDB-GENE-050517-20]
## 3512 ENSGACG00000009107 groupXIX 10282175 capping protein (actin filament) muscle Z-line, alpha 2 [Source:HGNC Symbol;Acc:HGNC:1490]
## 3583 ENSGACG00000009201 groupXIX 10353716 caveolin 1 [Source:ZFIN;Acc:ZDB-GENE-030131-2415]
## 3584 ENSGACG00000009202 groupXIX 10363935 caveolin 2 [Source:ZFIN;Acc:ZDB-GENE-040625-164]
## 3590 ENSGACG00000009210 groupXIX 10390061 testis derived transcript (3 LIM domains) [Source:ZFIN;Acc:ZDB-GENE-040718-59]
## 3656 ENSGACG00000009309 groupXIX 10403743 centrosomal protein 41 [Source:ZFIN;Acc:ZDB-GENE-040704-35]
## 3674 ENSGACG00000009342 groupXIX 10441422 dual specificity phosphatase 6 [Source:ZFIN;Acc:ZDB-GENE-030613-1]
## 3703 ENSGACG00000009378 groupXIX 10583153 transmembrane and tetratricopeptide repeat containing 3 [Source:ZFIN;Acc:ZDB-GENE-061221-2]
## 3727 ENSGACG00000010003 groupXIX 11486565 phosphatidylinositol-4-phosphate 3-kinase, catalytic subunit type 2 gamma [Source:HGNC Symbol;Acc:HGNC:8973]
## 3736 ENSGACG00000010014 groupXIX 11517961 pleckstrin homology domain containing, family A member 5 [Source:HGNC Symbol;Acc:HGNC:30036]
## 3745 ENSGACG00000010024 groupXIX 11574991 AE binding protein 2 [Source:HGNC Symbol;Acc:HGNC:24051]
## 3747 ENSGACG00000010026 groupXIII 9810875 ankyrin repeat and death domain containing 1B [Source:ZFIN;Acc:ZDB-GENE-060526-136]
## 3752 ENSGACG00000010032 groupXIX 11627393 phosphodiesterase 3A, cGMP-inhibited [Source:HGNC Symbol;Acc:HGNC:8778]
## 3760 ENSGACG00000010042 groupXIX 11688379 B-cell receptor-associated protein 29 [Source:HGNC Symbol;Acc:HGNC:24131]
## 3768 ENSGACG00000010054 groupXIX 11737187 HMG-box transcription factor 1 [Source:ZFIN;Acc:ZDB-GENE-050522-414]
## 3774 ENSGACG00000010060 groupXIX 11744274 protein kinase, cAMP-dependent, regulatory, type II, beta [Source:HGNC Symbol;Acc:HGNC:9392]
## 3790 ENSGACG00000010863 groupXIX 12871147 Bloom syndrome, RecQ helicase-like [Source:ZFIN;Acc:ZDB-GENE-070702-5]
## 3814 ENSGACG00000010898 groupXIX 12932396 alpha-kinase 3a [Source:ZFIN;Acc:ZDB-GENE-050419-48]
## 3847 ENSGACG00000010936 groupXIX 12952501 zinc finger protein 592 [Source:ZFIN;Acc:ZDB-GENE-030131-9613]
## 3866 ENSGACG00000010963 groupXIX 12980638 zgc:153293 [Source:ZFIN;Acc:ZDB-GENE-060825-315]
## 3868 ENSGACG00000010965 groupXIX 12984613 RAB19, member RAS oncogene family [Source:HGNC Symbol;Acc:HGNC:19982]
## 3880 ENSGACG00000010978 groupXIX 12985921 cat eye syndrome chromosome region, candidate 5 [Source:ZFIN;Acc:ZDB-GENE-080220-59]
## 3900 ENSGACG00000011004 groupXIX 12995354 Usher syndrome 1C (autosomal recessive, severe) [Source:ZFIN;Acc:ZDB-GENE-060312-41]
## 3929 ENSGACG00000011046 groupXIX 13053541 MOB kinase activator 2a [Source:ZFIN;Acc:ZDB-GENE-040718-56]
## 3942 ENSGACG00000011062 groupXIX 13172917 protein tyrosine phosphatase, receptor type, Jb, tandem duplicate 2 [Source:ZFIN;Acc:ZDB-GENE-131120-137]
## 3957 ENSGACG00000011081 groupXIX 13227769 oxysterol binding protein-like 5 [Source:ZFIN;Acc:ZDB-GENE-030131-5872]
## 3997 ENSGACG00000011130 groupXIX 13354738 mitochondrial ribosomal protein L23 [Source:ZFIN;Acc:ZDB-GENE-040625-12]
## 4026 ENSGACG00000011173 groupXIX 13427058
## 4027 ENSGACG00000011175 groupXIX 13430600 RAB19, member RAS oncogene family [Source:HGNC Symbol;Acc:HGNC:19982]
## 4031 ENSGACG00000011179 groupXIX 13455597 im:6904482 [Source:ZFIN;Acc:ZDB-GENE-050506-81]
## 4084 ENSGACG00000011713 groupXIX 14968720 early endosome antigen 1 [Source:ZFIN;Acc:ZDB-GENE-041111-270]
## 4091 ENSGACG00000011723 groupXIX 14986398 nudix (nucleoside diphosphate linked moiety X)-type motif 4a [Source:ZFIN;Acc:ZDB-GENE-031010-33]
## 4092 ENSGACG00000011725 groupXIX 14992460 ubiquitin-conjugating enzyme E2Nb [Source:ZFIN;Acc:ZDB-GENE-040426-1291]
## 4108 ENSGACG00000011745 groupXIX 15000177 nuclear receptor subfamily 1, group H, member 4 [Source:ZFIN;Acc:ZDB-GENE-040718-313]
## 4126 ENSGACG00000011771 groupXIX 15033199 si:dkey-103i16.1 [Source:ZFIN;Acc:ZDB-GENE-060503-47]
## 4213 ENSGACG00000011879 groupXIX 15132156 cation/H+ exchanger protein 2 [Source:ZFIN;Acc:ZDB-GENE-100825-2]
## 4221 ENSGACG00000011888 groupXIX 15143411 kelch domain containing 10 [Source:HGNC Symbol;Acc:HGNC:22194]
## 4242 ENSGACG00000011914 groupXIX 15189308 cholinergic receptor, muscarinic 2b [Source:ZFIN;Acc:ZDB-GENE-090410-3]
## 4265 ENSGACG00000011944 groupXIX 15214002 si:ch211-127m7.3 [Source:ZFIN;Acc:ZDB-GENE-141211-6]
## 4285 ENSGACG00000011975 groupXIX 15276451 KxDL motif containing 1 [Source:ZFIN;Acc:ZDB-GENE-040801-207]
## 4331 ENSGACG00000012034 groupXIX 15311956 leucine rich repeat containing 17 [Source:ZFIN;Acc:ZDB-GENE-030131-9774]
## 4344 ENSGACG00000012049 groupXIX 15324801 si:ch211-236c15.2 [Source:ZFIN;Acc:ZDB-GENE-120709-53]
## 4346 ENSGACG00000012053 groupXIX 15336362 round spermatid basic protein 1-like [Source:HGNC Symbol;Acc:HGNC:24765]
## 4357 ENSGACG00000012066 groupXIX 15348293 proline rich 5 (renal) [Source:ZFIN;Acc:ZDB-GENE-130530-791]
## 4359 ENSGACG00000012071 groupXIX 15357367 RAD52 homolog (S. cerevisiae) [Source:ZFIN;Acc:ZDB-GENE-050731-10]
## 4377 ENSGACG00000012099 groupXIX 15387902 ELKS/RAB6-interacting/CAST family member 1a [Source:ZFIN;Acc:ZDB-GENE-091214-5]
## 4383 ENSGACG00000012110 groupXIX 15399820
## 4462 ENSGACG00000012213 groupXIX 15460729 nudix (nucleoside diphosphate linked moiety X)-type motif 7 [Source:ZFIN;Acc:ZDB-GENE-131127-212]
## 4467 ENSGACG00000012221 groupXIX 15481309
## 4492 ENSGACG00000012349 groupXIX 15712132 choline kinase beta [Source:ZFIN;Acc:ZDB-GENE-030131-2928]
## 4521 ENSGACG00000012386 groupIII 50881 chemokine (C-X-C motif) receptor 4 [Source:HGNC Symbol;Acc:HGNC:2561]
## 4523 ENSGACG00000012388 groupXIX 15752174 progastricsin (pepsinogen C) [Source:HGNC Symbol;Acc:HGNC:8890]
## 4545 ENSGACG00000012412 groupXIX 15770641 arylsulfatase A [Source:ZFIN;Acc:ZDB-GENE-050320-118]
## 4570 ENSGACG00000012441 groupXIX 15776072
## 4584 ENSGACG00000012456 groupXIX 15776176
## 4586 ENSGACG00000012458 groupXIX 15837060 SH3 and multiple ankyrin repeat domains 3a [Source:ZFIN;Acc:ZDB-GENE-060503-369]
## 4597 ENSGACG00000012474 groupXIX 15918028 RAB, member of RAS oncogene family-like 2 [Source:ZFIN;Acc:ZDB-GENE-060503-464]
## 4638 ENSGACG00000012536 groupXIX 15947984
## 4639 ENSGACG00000012538 groupXIX 15950976
## 4642 ENSGACG00000012541 groupXIX 15963171 putative pyruvate dehydrogenase phosphatase isoenzyme 2 [Source:ZFIN;Acc:ZDB-GENE-000921-2]
## 4672 ENSGACG00000012580 groupXIX 16058808 receptor-interacting serine-threonine kinase 3 like [Source:ZFIN;Acc:ZDB-GENE-071115-4]
## 4693 ENSGACG00000012612 groupXIX 16082531 RNA binding motif protein 28 [Source:ZFIN;Acc:ZDB-GENE-040426-960]
## 4719 ENSGACG00000012640 groupXIX 16212950 hepatocyte growth factor b [Source:ZFIN;Acc:ZDB-GENE-041014-3]
## 4804 ENSGACG00000012758 groupXIX 16545546 family with sequence similarity 107, member B [Source:ZFIN;Acc:ZDB-GENE-031030-12]
## 4853 ENSGACG00000012835 groupXIX 16608508
## 4880 ENSGACG00000012874 groupXIX 16640414 protein phosphatase 6, regulatory subunit 2b [Source:ZFIN;Acc:ZDB-GENE-070705-441]
## 4922 ENSGACG00000013517 groupXIX 18023897 CCR4-NOT transcription complex, subunit 2 [Source:ZFIN;Acc:ZDB-GENE-070410-70]
## 4948 ENSGACG00000013552 groupXIX 18279323 tetratricopeptide repeat domain 38 [Source:ZFIN;Acc:ZDB-GENE-050522-318]
## 4958 ENSGACG00000013564 groupXIX 18286218 si:ch211-59c24.1 [Source:ZFIN;Acc:ZDB-GENE-060503-607]
## 4991 ENSGACG00000013611 groupXIX 18485565
## 4996 ENSGACG00000013617 groupXIX 18488096 calcium release activated channel regulator 2A [Source:HGNC Symbol;Acc:HGNC:28657]
## 5002 ENSGACG00000013623 groupXIX 18554402 si:dkeyp-2c8.2 [Source:ZFIN;Acc:ZDB-GENE-081031-100]
## 5020 ENSGACG00000013648 groupXIX 18713908 FYVE, RhoGEF and PH domain containing 4a [Source:ZFIN;Acc:ZDB-GENE-050420-347]
## 5056 ENSGACG00000013687 groupXIX 18747194 WEE1 homolog 2 (S. pombe) [Source:ZFIN;Acc:ZDB-GENE-030131-5682]
## 5064 ENSGACG00000013695 groupXIX 18764501 ATPase, H+ transporting, lysosomal, V1 subunit E1a [Source:ZFIN;Acc:ZDB-GENE-041212-51]
## 5071 ENSGACG00000013705 groupXIX 18768929 solute carrier family 25 (glutamate carrier), member 18 [Source:ZFIN;Acc:ZDB-GENE-041111-192]
## 5077 ENSGACG00000013714 groupXIX 18804888 cat eye syndrome chromosome region, candidate 1a [Source:ZFIN;Acc:ZDB-GENE-030902-4]
## 5119 ENSGACG00000013768 groupXIX 18860234 WAP four-disulfide core domain 1 [Source:ZFIN;Acc:ZDB-GENE-070112-352]
## 5124 ENSGACG00000013775 groupXIX 18901520 potassium voltage-gated channel, subfamily G, member 4a [Source:ZFIN;Acc:ZDB-GENE-050419-11]
## 5128 ENSGACG00000013779 groupXIX 18943143 si:dkey-246g23.2 [Source:ZFIN;Acc:ZDB-GENE-050419-100]
## 5131 ENSGACG00000013784 groupXIX 18973913 heat shock factor binding protein 1b [Source:ZFIN;Acc:ZDB-GENE-040426-1721]
## 5135 ENSGACG00000013788 groupXIX 18976351 zgc:173742 [Source:ZFIN;Acc:ZDB-GENE-030131-6489]
## 5209 ENSGACG00000013883 groupXIX 19632903 zgc:103697 [Source:ZFIN;Acc:ZDB-GENE-040912-104]
## 5217 ENSGACG00000013896 groupXIX 19648021 ring finger and SPRY domain containing 1 [Source:ZFIN;Acc:ZDB-GENE-061026-2]
## 5220 ENSGACG00000013899 groupXIX 19659023 ADP-ribosylation factor-like 2 binding protein [Source:ZFIN;Acc:ZDB-GENE-040426-1604]
## 5240 ENSGACG00000013931 groupXIX 19748754 apoptosis-inducing factor, mitochondrion-associated, 2 [Source:HGNC Symbol;Acc:HGNC:21411]
## 5246 ENSGACG00000013939 groupXIX 19756843 Bardet-Biedl syndrome 2 [Source:ZFIN;Acc:ZDB-GENE-020801-1]
## 5266 ENSGACG00000013963 groupXIX 19768602 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 9 [Source:ZFIN;Acc:ZDB-GENE-060503-611]
## 5292 ENSGACG00000013996 groupXIX 19940088
## 5353 ENSGACG00000014081 groupXIX 20045284 transport and golgi organization 6 homolog (Drosophila) [Source:ZFIN;Acc:ZDB-GENE-050419-237]
## 5359 ENSGACG00000014090 groupXIX 20079687 mitogen-activated protein kinase 8 interacting protein 1 [Source:ZFIN;Acc:ZDB-GENE-101025-1]
## 6745 ENSGACG00000018690 scaffold_654 2701 si:ch1073-89b12.1 [Source:ZFIN;Acc:ZDB-GENE-131121-340]
## 6747 ENSGACG00000018692 scaffold_654 7332 fatty acid desaturase 2 [Source:ZFIN;Acc:ZDB-GENE-011212-1]
## 7089 ENSGACG00000019136 groupIV 21957750 plexin C1 [Source:ZFIN;Acc:ZDB-GENE-030131-1620]
## 7984 ENSGACG00000020294 groupVII 15804561 V-set and immunoglobulin domain containing 1 [Source:HGNC Symbol;Acc:HGNC:28675]
## 8497 ENSGACG00000022681 groupXIX 8017901 Small nucleolar RNA SNORA19 [Source:RFAM;Acc:RF00413]
## Gene_Symbol
## 379
## 594 rnpc3
## 833 scdb
## 985 parvb
## 1005 parvg
## 1014 plxnb2b
## 1028 TUBGCP6
## 1042 appl2
## 1102 tmem117
## 1109 TWF1
## 1128 irak4
## 1138 ADAMTS20
## 1155 prickle1a
## 1161 PPHLN1 (1 of 3)
## 1162
## 1165 yaf2
## 1172 gxylt1b
## 1185 CNTN1 (1 of 2)
## 1193 PNPLA8 (1 of 2)
## 1195 dnajb9a
## 1197 dnm1l
## 1254 cald1b
## 1261 bpgm
## 1304 cnot4a
## 1338 ldhbb
## 1341 golt1ba
## 1355 slc35b4
## 1359 chchd3b
## 1381 NET1 (1 of 2)
## 1384 asb13b
## 1426 dhtkd1
## 1463 camk1db
## 1473 cdk16 (1 of 2)
## 1490
## 1518 ntn4
## 1534 usp3
## 1559 man2c1
## 1609 neil1
## 1612 commd4
## 1616 sema7a
## 1627 islr2
## 1630 stra6
## 1651 stoml1
## 1667 hexa
## 1671 eif2a
## 1710 ppcdc
## 1720 hacd3
## 1739 vwa9
## 1772 dennd4a
## 1798 map2k1
## 1939 snapc5
## 1946 smad6b
## 1951 smad3b
## 1975 aagab
## 2009 zgc:162898
## 2015 pias1b
## 2020 morf4l1
## 2072 ddb2
## 2075 kbtbd4
## 2101 ighmbp2
## 2127 chid1
## 2145 pddc1
## 2155 cd151
## 2180 si:ch211-247i17.1
## 2182 cracr2b
## 2187 tmem138
## 2194 tmem258
## 2195 myrf
## 2209 si:ch1073-89b12.1 (1 of 3)
## 2233 syt7a
## 2243
## 2248 si:dkey-201c1.2
## 2263
## 2268 pidd1
## 2288 zmp:0000001167
## 2307 athl1
## 2345 rab3il1
## 2361 hps5
## 2377 snrnp40
## 2381 gtf2h1
## 2445 fibina
## 2473 clpxb
## 2481 ap4e1
## 2487 gnb5a
## 2490 myo5c
## 2508 myo5ab
## 2533 rsl24d1
## 2564 TCF12 (1 of 2)
## 2571 CGNL1 (1 of 2)
## 2588 adam10b
## 2594 fan1
## 2613 sqrdl
## 2628 ctdspl2b
## 2656 parp16
## 2678
## 2687 ugt5a1
## 2813 cox5aa
## 2815 akip1
## 2857 psma1
## 2915 zgc:56106
## 2934 plekha7a
## 2954 syt8
## 2962 tnni2a.1
## 2981 lsp1
## 3080 si:ch1073-174d20.2
## 3107 rfwd3
## 3113 tmed6
## 3117 ZDHHC7 (1 of 2)
## 3132 ankrd27 (1 of 2)
## 3136 ankrd27 (2 of 2)
## 3152 zgc:162267 (2 of 2)
## 3158 stat3
## 3163
## 3168 ptprz1a
## 3185 aass
## 3199 cadps2
## 3221 asb15a
## 3255 pot1
## 3280
## 3283 zgc:101553
## 3311
## 3314 ftsj1
## 3350 CERK (1 of 2)
## 3359 si:ch211-286k11.4
## 3365 gramd4b
## 3376
## 3392 bet1l
## 3416 gtse1
## 3441
## 3442 cttnbp2
## 3463 cftr
## 3512 CAPZA2
## 3583 cav1
## 3584 cav2
## 3590 tes
## 3656 cep41
## 3674 dusp6
## 3703 tmtc3
## 3727 PIK3C2G
## 3736 PLEKHA5 (1 of 2)
## 3745 AEBP2 (1 of 2)
## 3747 ankdd1b
## 3752 PDE3A (2 of 2)
## 3760 BCAP29 (2 of 2)
## 3768 hbp1
## 3774 PRKAR2B
## 3790 blm
## 3814 alpk3a
## 3847 znf592
## 3866 zgc:153293
## 3868 RAB19 (1 of 3)
## 3880 cecr5
## 3900 ush1c
## 3929 mob2a
## 3942 ptprjb.2
## 3957 osbpl5
## 3997 mrpl23
## 4026
## 4027 RAB19 (2 of 3)
## 4031 im:6904482
## 4084 eea1
## 4091 nudt4a
## 4092 ube2nb
## 4108 nr1h4
## 4126 si:dkey-103i16.1
## 4213 cax2
## 4221 KLHDC10 (1 of 2)
## 4242 chrm2b
## 4265 si:ch211-127m7.3
## 4285 kxd1
## 4331 lrrc17 (1 of 2)
## 4344 si:ch211-236c15.2
## 4346 RSBN1L
## 4357 prr5
## 4359 rad52
## 4377 erc1a
## 4383
## 4462 nudt7
## 4467
## 4492 chkb
## 4521 CXCR4 (2 of 2)
## 4523 PGC
## 4545 arsa
## 4570
## 4584
## 4586 shank3a
## 4597 rabl2
## 4638
## 4639
## 4642 pdp2
## 4672 ripk3l
## 4693 rbm28
## 4719 hgfb
## 4804 fam107b
## 4853
## 4880 ppp6r2b
## 4922 cnot2
## 4948 ttc38
## 4958 si:ch211-59c24.1
## 4991
## 4996 CRACR2A (1 of 2)
## 5002 si:dkeyp-2c8.2
## 5020 fgd4a
## 5056 wee2
## 5064 atp6v1e1a
## 5071 slc25a18
## 5077 cecr1a
## 5119 wfdc1
## 5124 kcng4a
## 5128 si:dkey-246g23.2
## 5131 hsbp1b
## 5135 zgc:173742
## 5209 zgc:103697
## 5217 rspry1
## 5220 arl2bp
## 5240 AIFM2
## 5246 bbs2
## 5266 b3gnt9
## 5292
## 5353 tango6
## 5359 mapk8ip1
## 6745 si:ch1073-89b12.1 (3 of 3)
## 6747 fads2 (3 of 3)
## 7089 plxnc1
## 7984 VSIG1
## 8497 SNORA19
Suppose that I got a positive result on an HIV test. What’s the chance I am HIV positive? (Here we really mean that “I” am a randomly chosen person from the US population.)
OraQUICK advance rapid test: 99.4% specificity and 99.8% sensitivity (from MLO online).
Refreshing from Wikipedia, specificity is the “true positive” rate and the sensitivity is the “true negative” rate:
There are currently around 1.1 million people with HIV in the US, out of a total of 328 million, giving an overall rate of 0.00335 = 0.335%.
We want to know \[\begin{aligned} & \P\{ \text{HIV+} | \text{postive test} \} \\ & \qquad = \frac{\P\{ \text{HIV+ and getting a positive test} \} }{ \P\{ \text{getting a positive test} \} } \end{aligned}\]
Start with a large sample from the US population, some of whom have HIV and others do not, and then give them all HIV tests.
N <- 1e6
people <- data.frame(
hiv = runif(N) < pop_rate)
people$test <- NA
people$test[people$hiv] <- ifelse(runif(sum(people$hiv)) < true_pos, "+", "-")
people$test[!people$hiv] <- ifelse(runif(sum(!people$hiv)) < true_neg, "-", "+")
addmargins(table(status=people$hiv, test=people$test))
## test
## status - + Sum
## FALSE 994744 2003 996747
## TRUE 19 3234 3253
## Sum 994763 5237 1000000
## test
## status - + Sum
## FALSE 994744 2003 996747
## TRUE 19 3234 3253
## Sum 994763 5237 1000000
For example, there are 2003 who do not have HIV but got a positive test result.
What is the proportion of people who got a positive test result who actually have HIV?
Now let’s compute: the probability we want is the proportion of people who got a positive test result who actually have HIV:
The proportion of the 5237 in this sample of 106 that had a positive test result that actually have HIV is 3234/5237 = 61.75%.
We want to compute the theoretical probability of having HIV given a positive test result, or \[ P(\text{HIV} \;|\; + ) = \frac{P(\text{HIV and}\; +)}{P(+)} \]
The probability that a randomly chosen person from the population has HIV and got a positive test on this test is \[\begin{aligned} P(\text{HIV}\; \text{and}\; +) &= P(\text{HIV}) \times P(+ \;|\; \text{HIV}) \\ &= 0.00335 \times 0.994 \\ &= 0.0033299 \end{aligned}\]
We’ll also need the complementary probability, \[\begin{aligned} P(\text{not HIV}\; \text{and}\; +) &= P(\text{not HIV}) \times P(+ \;|\; \text{not HIV}) \\ &= (1 - 0.00335) \times (1 - 0.998) \\ &= 0.0019933 \end{aligned}\]
And, the probability that a randomly chosen person from the population has a positive test result is \[\begin{aligned} P(+) &= P(\text{HIV and}\; +) + P(\text{not HIV and}\; +) \\ &= 0.0033299 + 0.0019933 \\ &= 0.0053232. \end{aligned}\]
Putting these together, we get that \[\begin{aligned} P(\text{HIV} \;|\; + ) &= \frac{0.0033299}{0.0053232} \\ &= 0.6255448. \end{aligned}\] In other words, we get a predicted probability of 62.5%.