Peter Ralph
Advanced Biological Statistics
the probability of seeing a result at least as surprising as what was observed in the data, if the null hypothesis is true.
Usually, this means
which can all be defined to suit the situation.
If the null hypothesis were true, then you’d be really unlikely to see something like what you actually did.
So, either the “null hypothesis” is not a good description of reality or something surprising happened.
How useful this is depends on the null hypothesis.
##
## Welch Two Sample t-test
##
## data: airbnb$price[airbnb$instant_bookable] and airbnb$price[!airbnb$instant_bookable]
## t = 3.6482, df = 5039.8, p-value = 0.0002667
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 4.475555 14.872518
## sample estimates:
## mean of x mean of y
## 124.6409 114.9668
##
## One Sample t-test
##
## data: airbnb$price
## t = 91.32, df = 5601, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 116.9734 122.1058
## sample estimates:
## mean of x
## 119.5396
Is that \(p\)-value useful?
My hypothesis: People tend to have longer index fingers on the hand they write with because writing stretches the ligaments.
(class survey) How many people have a longer index finger on the hand they write with?
(class survey) Everyone flip a coin:
ifelse(runif(1) < 0.5, "H", "T")
We want to estimate the parameter
\[\begin{equation} \theta = \P(\text{random person has writing finger longer}) , \end{equation}\]
and now we have a fake dataset with \(\theta = 1/2\).
Let’s get some more data:
n <- 37 # class size
sum(ifelse(runif(n) < 1/2, "H", "T") == "H")
Now we can estimate the \(p\)-value for the hypothesis that \(\theta = 1/2\).
A faster method:
replicate(1000, sum(rbinom(n, 1, 1/2) > 0))
or, equivalently,
rbinom(1000, n, 1/2)
Either math:
Or, computers. (maybe math, maybe simulation, maybe both)
So, where did this \(p\)-value come from?
##
## Welch Two Sample t-test
##
## data: airbnb$price[airbnb$instant_bookable] and airbnb$price[!airbnb$instant_bookable]
## t = 3.6482, df = 5039.8, p-value = 0.0002667
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 4.475555 14.872518
## sample estimates:
## mean of x mean of y
## 124.6409 114.9668
The \(t\) distribution! (see separate slides)