Peter Ralph
12 March 2020 – Advanced Biological Statistics
Care, or at least think, about the data.
Look at the data.
Query the data.
Sanity check.
Communicate.
statistics are numerical summaries of data,
parameters are numerical attributes of a model.
confidence intervals
\(p\)-values
report effect sizes!
Central Limit Theorems:
experiment versus observational study
controls, randomization, replicates
samples: from what population?
statistical power : \(\sigma/\sqrt{n}\)
confounding factors
correlation versus causation
readable
descriptive
documented
columns are variables, rows are observations
semantically coherent
makes precise visual analogies
with real units
labeled
maximize information per unit ink
\(t\)-tests
ANOVA: ratios of mean-squares
Kaplan-Meier survival curves
Cox proportional hazard models
smoothing: loess
multiple comparisons: Bonferroni; FDR
simulation
simulation
oh, and simulation
permutation tests
goodness of fit
crossvalidation
interpolation
nonidentifiability
Negative-binomial: Differential gene expression
Poisson: non-normalized transcript countsi
Cauchy: # of cases of Coronavirus in cities around the world
Normal (gaussian): height
Beta-binomial: Coin flips with random coin
Exponential: radioactive decay
LogNormal: milk production by cows
Beta: probability that a baseball player gets a hit
Gamma: Amount of snowfall accumulated
Dirichlet: probability distribution of n proportions sizes (which must all sum to 1). Such as if you have three subspecies making up a species population: this would be an appropriate prior for the proportion sizes
Poisson: Total counts of coffee’s sold at starbucks at any particular time
Weibull: Survival analysis
Multivariate Normal: probability of something happening in 3D space
Poisson--birth rates
Poisson - Counts of rare plant species (from presence/absence surveys)
Weibull: time until someone falls ill with coronavirus
Cauchy: Distance commuted on a daily basis
Multivariate normal: height, weight, and shoe size
Weibull: hazard rate in survival analysis
F distribution: the F statistic
Multivariate Normal: spatial correlation of bike trip
Poisson distribution: observing transcript counts, looking at the effects of age and exposure
Negative binomial: how many eggs a chicken lays until the 5th egg with two yolks
logNormal: rent prices across Manhattan
Poisson: number of bursts of EEG activity
Binomial: number of people on the bus who should be at home ‘cause they are sick
Poisson: number of cosmic particles going through a detector
Poisson: number of toilet paper sheets with errors in perforation
Beta binomial - number of polls (across different political analysis sources?) that say that Bernie will win
lm()
\[ X_{ijk} = \mu + \alpha_i + \beta_j + \epsilon_{ijk} \]
linear: describes the +
s
R’s formulas are powerful (model.matrix( )
!!)
least-squares regression: implies Gaussian noise
model comparison: with ANOVA and the \(F\) test
Random effects:
ALGAE ~ TREAT + (1|PATCH)
kinda picky
can climb the posterior likelihood surface (optimizing( )
)
or, can skateboard around on it (sampling( )
)
\[\begin{aligned} &\text{(response distribution)} \\ &\qquad \sim \text{(inverse link function)} + \text{(linear predictor)} \end{aligned}\]
glm()
/glmer
glm(er)
:
stan
:
brms
: