\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Summary

Peter Ralph

3 December 2020 – Advanced Biological Statistics

Wrap-up

Steps in data analysis

  1. Care, or at least think, about the data.

  2. Look at the data.

  3. Query the data.

  4. Check the results.

  5. Communicate.

Modeling, and Stan

  1. How well a statistical method works depends on the situation.

  2. We can describe the “situation” with a probability model.

  3. Inference usually works best if the probabilistic model reflects reality .

  4. Explicit models make it easy to simulate, and therefore test your methods.

  5. Stan lets you do inference using (almost) arbitrary models.

Hierarchical Bayesian models

  1. It is often possible to infer things about populations that we can’t infer about individuals.

  2. Doing so leads to sharing of information (or, “power”) between samples, and can improve accuracy.

  3. Priors (and hyperpriors) on individual parameters provides a good way to do this.

Concepts

  • statistic versus parameter
  • quantifying uncertainty
  • experiment vs observation
  • controls
  • statistical power/sensitivity
  • tidy data
  • Markov chain Monte Carlo
  • permutation test
  • multiple comparisons
  • shrinkage and sharing power
  • probability models
  • simulation
  • \(p\)-values
  • hypothesis testing
  • confidence and credible intervals
  • linear models
  • random effects
  • prior, likelihood, and posterior
  • goodness-of-fit

Distributions:

  • Central Limit Theorem
  • Gaussian/Normal
  • Student’s \(t\)
  • Binomial
  • Beta
  • Beta-Binomial
  • Exponential
  • Gamma
  • Cauchy
  • Poisson

Visualization:

  • center, spread, outliers
  • histograms
  • scatter plots
  • boxplots
  • maximize information per unit of ink

Statistical models:

  • ANOVA, partition of variance
  • least-squares fitting \(\sim\) Gaussian
  • Beta-Binomial
  • logistic linear models
  • robust linear models
  • Generalized Linear (Mixed) Models

Identify the GLM

Which response distribution for the GLM?

  1. How number of pumpkins per vine depends on fertilizer and water amount.

  2. How distance from home to workplace is predicted by income, job category, and city.

  3. How (presence or absence of) hip dysplasia in dogs depends on age and breed.

  4. How doughnut weight varies between and within bakeries and doughnut types.

  5. How house prices are predicted by elevation, distance to stores, and square footage.

Options: normal / binomial / poisson / gamma / cauchy

An advertisement:

The brms package lets you

fit hierarchical models using Stan

with mixed-model syntax!!!

# e.g.
brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = poisson(link='log'))
# or
brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = student(link='identity'))
brms logo
// reveal.js plugins