\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Summary

Peter Ralph

3 December 2020 – Advanced Biological Statistics

Wrap-up

Steps in data analysis

Care, or at least think, about the data.
Look at the data.
Query the data.
Check the results.
Communicate.

Modeling, and Stan

How well a statistical method works depends on the situation.
We can describe the “situation” with a probability model.
Inference usually works best if the probabilistic model reflects reality .
Explicit models make it easy to simulate, and therefore test your methods.
Stan lets you do inference using (almost) arbitrary models.

Hierarchical Bayesian models

It is often possible to infer things about populations that we can’t infer about individuals.
Doing so leads to sharing of information (or, “power”) between samples, and can improve accuracy.
Priors (and hyperpriors) on individual parameters provides a good way to do this.

Concepts

statistic versus parameter
quantifying uncertainty
experiment vs observation
controls
statistical power/sensitivity
tidy data
Markov chain Monte Carlo
permutation test
multiple comparisons
shrinkage and sharing power

probability models
simulation
\(p\)-values
hypothesis testing
confidence and credible intervals
linear models
random effects
prior, likelihood, and posterior
goodness-of-fit

Distributions:

Central Limit Theorem
Gaussian/Normal
Student’s \(t\)
Binomial
Beta
Beta-Binomial
Exponential
Gamma
Cauchy
Poisson

Visualization:

center, spread, outliers
histograms
scatter plots
boxplots
maximize information per unit of ink

Statistical models:

ANOVA, partition of variance
least-squares fitting \(\sim\) Gaussian
Beta-Binomial
logistic linear models
robust linear models
Generalized Linear (Mixed) Models

Identify the GLM

Which response distribution for the GLM?

How number of pumpkins per vine depends on fertilizer and water amount.
How distance from home to workplace is predicted by income, job category, and city.
How (presence or absence of) hip dysplasia in dogs depends on age and breed.
How doughnut weight varies between and within bakeries and doughnut types.
How house prices are predicted by elevation, distance to stores, and square footage.

Options: normal / binomial / poisson / gamma / cauchy

An advertisement:

The brms package lets you

fit hierarchical models using Stan

with mixed-model syntax!!!

# e.g.
brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = poisson(link='log'))
# or
brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = student(link='identity'))