Peter Ralph
15 October – Advanced Biological Statistics
##
## Welch Two Sample t-test
##
## data: airbnb$price[airbnb$instant_bookable] and airbnb$price[!airbnb$instant_bookable]
## t = 3.6482, df = 5039.8, p-value = 0.0002667
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 4.475555 14.872518
## sample estimates:
## mean of x mean of y
## 124.6409 114.9668
But, the \(t\) test relies on Normality. Is the distribution of AirBnB prices too “wierd”? How can we be sure?
Methods:
Remove the big values and try again.
Use a nonparametric test.
(demonstration)
Observation: If there was no meaningful difference in prices between “instant bookable” and not, then randomly shuffling that label won’t change anything.
Strategy:
instant_bookable
column.Why is this a \(p\)-value? For what hypothesis?
fake_is_instant <- sample(airbnb$instant_bookable)
(mean(airbnb$price[fake_is_instant], na.rm=TRUE) -
mean(airbnb$price[!fake_is_instant], na.rm=TRUE))
## [1] 2.837541
real_diff <- (mean(airbnb$price[airbnb$instant_bookable], na.rm=TRUE) -
mean(airbnb$price[!airbnb$instant_bookable], na.rm=TRUE))
permuted_diffs <- replicate(1000, {
fake_is_instant <- sample(airbnb$instant_bookable)
(mean(airbnb$price[fake_is_instant], na.rm=TRUE) -
mean(airbnb$price[!fake_is_instant], na.rm=TRUE))
} )
hist(permuted_diffs, xlab="shuffled differences in mean", xlim=range(c(permuted_diffs, real_diff)))
abline(v=real_diff, col='red', lwd=3)
## [1] 0
The difference in price between instant bookable and not instant bookable is highly statistically significant (\(p \approx 0.001\), permutation test).
Do the analogous thing for the ANOVA comparing price between neighbourhoods:
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## neighbourhood 91 6015248 66102 7.6277 < 2.2e-16 ***
## Residuals 5510 47749952 8666
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
true_F <- anova(lm(price ~ neighbourhood, data=airbnb))[["F value"]][1]
# do this a lot of times:
## DO NOT put the randomness here: fake_neighbourhood <- sample(airbnb$neighbourhood)
fake_Fs <- replicate(1000, {
fake_neighbourhood <- sample(airbnb$neighbourhood) # randomness must be here
anova(lm(price ~ fake_neighbourhood, data=airbnb))[["F value"]][1]
})
hist(fake_Fs, xlim=range(fake_Fs, c(-1,1)*true_F), xlab="permuted F values")
abline(v=true_F, col='red', lwd=2)
pattern discovery
efficient summary of information
visual/spatial analogy for quantitative patterns
aim to maximize information and minimize ink
Above all else show the data.
Distributions of litter sizes by Order, and Family, in the PanTHERIA dataset:
pantheria <- read_pantheria("../Datasets/PanTHERIA")
# look at most common orders
order_nums <- sort(table(pantheria$Order))
big_orders <- names(order_nums)[order_nums > 150]
##
## Microbiotheria Tubulidentata Dermoptera Notoryctemorphia
## 1 1 2 2
## Proboscidea Hyracoidea Monotremata Sirenia
## 3 4 5 5
## Paucituberculata Pholidota Pilosa Macroscelidea
## 6 8 10 15
## Perissodactyla Scandentia Cingulata Peramelemorphia
## 17 20 21 21
## Erinaceomorpha Afrosoricida Dasyuromorphia Cetacea
## 24 51 71 84
## Didelphimorphia Lagomorpha Diprotodontia Artiodactyla
## 87 92 143 240
## Carnivora Primates Soricomorpha Chiroptera
## 286 376 428 1116
## Rodentia
## 2277
px <- (pantheria %>% filter(Order %in% big_orders)
%>% filter(!is.na(LitterSize))
%>% select(Order, Family, Genus, Species, LitterSize))
for (xn in c("Order", "Family", "Genus")) px[[xn]] <- factor(px[[xn]])
summary(px)
## Order Family Genus
## Artiodactyla:178 Muridae : 242 Microtus : 38
## Carnivora :209 Cricetidae : 239 Myotis : 38
## Chiroptera :465 Sciuridae : 158 Crocidura : 36
## Primates :209 Vespertilionidae: 135 Peromyscus : 32
## Rodentia :883 Bovidae : 110 Sorex : 32
## Soricomorpha:116 Phyllostomidae : 106 Spermophilus: 31
## (Other) :1070 (Other) :1853
## Species LitterSize
## Length:2060 Min. : 0.960
## Class :character 1st Qu.: 1.000
## Mode :character Median : 1.970
## Mean : 2.489
## 3rd Qu.: 3.490
## Max. :11.300
##
## [1] 0.98 4.50 3.74 5.72 4.98 1.22 1.00 1.22 1.01 1.02 1.02
## [12] 1.02 1.02 1.39 1.39 3.89 2.00 1.09 1.73 1.45 1.11 1.02
## [23] 1.01 1.01 1.00 1.01 1.02 1.02 1.01 1.01 1.05 1.84 2.00
## [34] 2.00 2.00 2.31 2.00 2.00 1.93 1.04 1.94 3.00 2.19 1.94
## [45] 2.36 2.62 3.00 1.29 4.91 5.30 4.29 6.27 4.00 4.99 1.50
## [56] 1.50 2.00 3.40 1.01 1.02 1.01 2.00 1.22 1.00 1.73 1.00
## [67] 0.99 1.00 7.29 6.12 1.00 1.00 1.00 1.00 1.00 1.01 1.01
## [78] 1.01 1.02 1.01 1.02 1.01 1.02 1.01 1.02 1.02 1.02 1.00
## [89] 1.00 1.00 1.02 1.00 1.02 1.01 1.02 1.02 1.01 1.02 3.09
## [100] 1.09 1.00 1.00 0.99 1.00 0.99 2.91 3.68 4.00 3.45 5.28
## [111] 5.34 5.53 3.46 3.40 3.50 6.06 5.71 5.20 7.14 3.99 0.99
## [122] 2.99 2.99 3.00 3.59 4.29 1.99 3.37 1.99 1.73 3.74 3.00
## [133] 2.99 3.89 3.00 2.44 4.04 3.99 3.49 1.22 1.50 4.86 1.00
## [144] 1.00 1.00 1.01 1.50 2.50 2.29 2.12 0.96 0.98 1.00 3.40
## [155] 5.39 3.95 2.62 2.52 1.21 0.98 1.00 0.98 1.02 1.00 1.00
## [166] 1.00 1.00 1.34 1.41 1.38 3.49 3.97 3.94 2.95 3.45 4.62
## [177] 1.70 2.75 3.63 2.60 2.62 1.00 1.00 3.78 3.49 3.85 2.79
## [188] 2.50 1.97 0.99 4.99 1.18 4.34 3.88 6.24 4.99 4.99 3.46
## [199] 4.66 3.67 1.00 0.99 1.00 11.30 2.83 2.75 3.06 1.93 1.01
## [210] 4.00 1.00 1.00 1.53 1.10 1.04 1.11 1.01 0.96 1.29 2.60
## [221] 2.19 3.54 2.25 2.50 1.00 1.00 1.00 1.50 1.22 1.22 1.00
## [232] 1.00 1.39 4.14 3.00 1.73 1.00 3.00 2.00 3.49 2.89 1.71
## [243] 0.96 1.00 3.49 4.00 1.00 3.88 1.50 1.00 1.00 1.00 1.22
## [254] 1.01 4.37 0.97 2.50 1.87 5.83 3.42 1.94 1.00 0.99 0.98
## [265] 0.99 0.98 1.65 1.01 1.00 2.68 2.15 1.07 1.01 1.02 1.02
## [276] 1.02 1.00 1.50 1.00 1.79 2.00 1.00 1.22 2.00 2.30 1.00
## [287] 0.98 0.99 1.00 0.98 1.94 0.98 3.60 2.95 2.47 1.50 2.14
## [298] 1.34 3.45 1.90 1.01 1.05 1.01 1.00 1.01 0.98 0.99 1.00
## [309] 3.14 3.49 1.66 3.29 3.36 2.99 1.99 3.94 3.00 3.99 2.49
## [320] 2.99 2.00 0.98 2.26 2.04 0.98 0.98 0.99 0.99 2.00 2.31
## [331] 0.99 0.98 1.00 0.98 1.00 1.01 1.00 1.00 0.99 1.02 1.97
## [342] 5.39 3.49 3.60 3.49 2.00 1.00 2.00 1.40 1.50 1.25 0.99
## [353] 1.00 0.99 1.00 4.50 3.00 0.97 1.94 0.98 1.00 0.99 1.00
## [364] 2.34 1.94 1.94 1.94 5.23 0.97 3.11 2.00 1.99 3.49 5.99
## [375] 1.91 4.00 4.29 1.07 3.00 1.84 2.48 2.60 3.00 2.98 2.99
## [386] 2.99 4.55 2.00 2.00 2.00 3.00 4.34 2.50 1.01 1.00 4.30
## [397] 2.15 2.30 0.98 0.99 1.00 4.48 5.40 4.43 4.18 4.89 0.99
## [408] 1.50 1.00 1.00 1.00 0.98 1.41 2.67 1.01 3.19 4.68 2.60
## [419] 2.60 1.94 3.36 2.91 3.11 4.77 2.39 2.37 1.94 2.14 2.95
## [430] 3.78 2.59 2.67 2.68 1.73 3.49 0.99 0.99 0.99 0.99 0.98
## [441] 0.98 1.00 1.75 1.94 4.00 1.00 3.68 4.18 3.24 3.00 1.94
## [452] 0.99 1.22 1.08 2.14 1.00 1.22 0.96 3.74 7.50 1.00 1.00
## [463] 1.80 4.77 3.42 2.00 1.00 0.99 0.98 0.99 0.99 0.98 1.00
## [474] 1.46 1.86 1.85 1.50 2.76 1.73 1.04 4.71 1.89 2.53 3.14
## [485] 3.09 1.68 4.00 5.94 3.40 3.92 4.06 2.74 4.00 4.50 4.44
## [496] 5.26 3.94 3.64 4.40 3.87 3.89 8.00 1.00 1.00 0.98 0.99
## [507] 1.04 2.46 1.94 2.94 4.83 4.12 1.78 4.99 5.17 4.99 2.84
## [518] 1.94 1.90 0.96 1.01 1.50 1.00 0.98 1.00 1.10 2.46 3.00
## [529] 2.21 2.50 3.00 3.59 1.00 1.00 2.43 2.62 2.95 2.00 2.64
## [540] 1.32 2.44 0.98 1.80 0.97 1.26 1.00 1.29 1.29 0.99 1.60
## [551] 1.50 4.00 4.86 3.16 4.65 3.30 1.29 3.00 3.13 1.12 4.00
## [562] 5.17 1.00 0.98 0.98 0.98 1.00 1.00 0.98 4.83 1.05 1.97
## [573] 6.97 4.99 3.16 4.41 4.99 3.00 3.11 2.90 3.11 2.43 3.88
## [584] 3.24 2.41 4.00 1.00 0.98 2.91 2.00 2.91 3.49 1.73 1.41
## [595] 1.00 1.01 3.49 2.00 1.00 1.00 1.00 1.00 1.50 0.99 0.99
## [606] 0.99 0.98 0.99 1.00 1.00 1.00 0.99 1.00 0.99 0.98 0.98
## [617] 1.00 0.99 1.00 0.98 1.00 0.98 1.00 1.39 3.88 3.12 1.00
## [628] 2.10 2.30 2.44 2.72 2.30 2.00 3.49 4.99 3.51 3.00 0.96
## [639] 1.22 1.02 1.01 1.01 2.73 2.95 3.41 0.98 1.94 1.77 3.12
## [650] 1.96 2.14 2.97 2.68 0.99 5.11 1.00 1.02 1.00 4.00 4.18
## [661] 3.00 4.61 3.11 0.99 2.23 2.00 1.07 1.00 2.50 2.50 1.26
## [672] 1.40 2.00 1.51 1.41 1.73 1.73 2.35 1.94 2.00 2.30 0.99
## [683] 0.97 1.00 1.01 2.64 3.85 0.97 1.00 1.00 0.98 0.98 1.00
## [694] 0.98 1.52 1.00 1.00 1.00 1.00 1.00 1.00 5.64 5.83 5.99
## [705] 5.84 4.04 5.80 5.01 7.48 1.87 2.00 1.00 0.98 1.03 1.00
## [716] 1.00 1.00 1.45 3.11 2.35 1.01 1.00 1.00 1.71 3.25 3.26
## [727] 3.78 0.98 0.99 0.99 0.99 0.99 0.99 1.00 0.99 1.00 1.22
## [738] 1.00 1.00 1.00 0.98 0.99 1.00 0.99 1.50 1.20 1.20 1.74
## [749] 0.98 2.68 0.99 1.00 5.99 6.20 4.67 4.99 3.65 4.66 4.00
## [760] 2.73 4.10 4.00 4.50 5.10 6.39 6.86 4.61 3.68 4.64 1.18
## [771] 2.91 2.04 1.94 1.66 1.50 1.94 3.33 1.01 1.01 1.00 1.02
## [782] 1.00 2.95 2.35 1.88 1.00 0.99 1.00 0.99 0.98 3.20 4.22
## [793] 3.67 3.69 0.98 0.99 1.00 0.96 0.98 0.98 0.98 0.98 2.67
## [804] 2.54 3.69 1.58 1.01 0.98 1.00 0.99 2.30 1.00 1.55 1.94
## [815] 3.00 1.97 2.75 1.44 4.16 2.91 2.01 3.24 4.88 3.09 3.00
## [826] 8.10 2.73 2.30 2.50 2.76 1.01 1.00 1.00 1.02 1.01 1.01
## [837] 1.01 1.01 1.00 1.01 1.01 1.01 1.02 1.01 1.00 4.10 4.54
## [848] 1.75 1.00 8.79 9.01 8.99 0.97 0.99 0.99 1.00 1.00 2.50
## [859] 1.00 0.98 0.98 0.98 1.00 1.00 2.50 1.00 2.46 2.91 4.04
## [870] 1.02 1.07 1.02 2.68 1.69 1.01 1.02 3.59 2.60 3.53 3.78
## [881] 3.49 3.02 3.13 2.04 1.41 7.96 11.23 9.23 2.91 3.29 3.54
## [892] 3.00 1.22 1.50 1.00 1.22 1.00 1.41 2.60 2.00 2.00 2.50
## [903] 2.50 2.20 2.15 2.50 2.17 0.98 1.54 4.68 5.70 4.21 2.09
## [914] 3.60 6.80 2.33 5.49 3.89 5.54 4.30 3.00 1.20 3.87 5.62
## [925] 3.32 5.16 2.47 1.40 5.93 7.89 5.51 2.46 2.91 4.32 5.06
## [936] 2.10 8.10 2.30 5.72 0.98 0.98 0.99 0.99 1.00 0.99 1.00
## [947] 1.00 1.00 0.99 1.22 1.00 0.98 1.00 5.18 4.50 1.98 3.49
## [958] 2.48 3.16 3.16 2.00 2.00 2.52 3.94 3.89 4.92 0.99 0.98
## [969] 0.99 0.99 3.00 4.65 4.99 4.50 4.41 4.62 3.58 0.98 0.98
## [980] 0.99 0.99 0.99 0.99 0.99 0.99 0.99 1.44 1.42 2.00 0.99
## [991] 3.00 1.50 1.00 1.84 2.51 5.34 8.48 6.14 1.97 4.73 1.37
## [1002] 5.96 4.40 4.76 2.95 4.18 3.49 5.54 6.93 3.92 6.69 5.15
## [1013] 1.94 4.40 4.30 5.44 6.74 7.82 6.50 4.50 3.36 5.07 4.00
## [1024] 8.48 6.48 4.04 5.34 3.10 4.83 5.37 4.31 5.01 5.60 4.10
## [1035] 1.00 1.97 1.77 5.25 1.97 2.99 2.99 4.00 2.90 4.50 3.11
## [1046] 1.00 0.98 1.00 1.88 1.00 0.98 0.99 0.99 0.99 0.98 1.00
## [1057] 1.21 0.98 0.99 0.98 0.98 1.00 0.99 0.99 0.98 0.99 0.99
## [1068] 1.00 1.00 1.00 1.50 0.99 0.98 0.98 1.00 1.00 1.00 0.98
## [1079] 4.50 1.02 4.00 3.69 0.99 3.94 4.50 4.86 3.52 2.25 2.25
## [1090] 2.33 7.10 6.39 1.35 0.98 1.96 1.98 1.86 2.00 2.14 0.97
## [1101] 3.45 3.11 2.79 1.29 2.70 2.00 2.59 2.00 1.00 4.00 2.50
## [1112] 2.95 2.81 0.98 3.30 4.95 3.64 4.77 3.59 0.96 6.55 3.72
## [1123] 3.88 4.08 2.50 2.00 1.06 3.20 1.00 1.00 1.00 4.18 2.99
## [1134] 1.94 2.37 3.70 4.86 4.41 0.99 1.01 3.69 4.01 2.95 3.07
## [1145] 3.28 0.99 1.00 1.91 1.82 1.73 1.33 6.36 0.98 1.00 0.99
## [1156] 1.00 0.99 1.00 1.00 1.12 1.75 2.54 1.94 0.98 0.98 0.98
## [1167] 1.00 0.99 1.00 0.99 0.99 0.99 2.00 2.00 1.54 2.62 5.31
## [1178] 2.04 1.22 1.61 1.57 3.15 3.84 3.30 1.40 1.00 0.98 1.97
## [1189] 1.00 1.00 1.00 3.40 3.59 3.23 3.63 4.37 1.68 1.00 4.00
## [1200] 1.14 1.02 2.10 0.99 1.00 2.86 1.59 1.65 3.70 1.29 1.70
## [1211] 1.38 0.99 2.20 0.98 1.01 1.22 1.19 1.16 1.22 1.00 2.51
## [1222] 1.00 3.49 1.00 2.16 1.01 1.05 2.75 1.96 2.14 2.60 8.47
## [1233] 6.28 5.48 1.00 0.98 1.00 1.01 1.01 1.01 1.01 1.01 0.97
## [1244] 2.00 3.40 3.29 3.00 2.50 1.52 1.07 1.69 2.15 1.94 2.00
## [1255] 0.98 4.49 0.97 1.06 2.02 1.22 2.20 1.60 1.50 2.00 2.04
## [1266] 2.91 1.56 1.17 0.98 1.00 1.09 5.48 4.37 3.94 4.00 4.45
## [1277] 4.18 4.86 3.53 3.49 3.18 1.94 3.08 3.88 2.52 1.94 3.70
## [1288] 2.43 1.94 3.09 2.39 5.99 4.27 0.99 0.98 1.21 2.12 1.00
## [1299] 1.00 0.98 2.91 4.76 2.00 2.23 2.91 3.69 2.75 3.40 2.56
## [1310] 4.86 3.88 2.79 3.59 2.38 3.49 3.00 1.94 1.00 0.99 1.00
## [1321] 0.97 1.00 2.00 1.07 1.50 2.00 1.50 2.46 3.20 1.94 1.01
## [1332] 4.66 4.80 1.00 0.98 1.00 1.00 1.00 1.00 0.99 0.98 0.98
## [1343] 0.99 0.99 0.99 2.79 2.16 2.84 2.25 1.00 3.40 0.99 1.00
## [1354] 1.01 1.02 1.00 1.00 2.50 1.28 0.99 0.98 1.00 1.00 0.99
## [1365] 1.98 1.40 4.59 2.00 0.99 1.07 0.99 6.04 5.74 6.49 4.99
## [1376] 4.19 4.02 4.25 1.00 0.98 1.00 1.01 1.02 1.01 1.01 2.10
## [1387] 1.98 1.40 1.86 2.29 1.21 1.96 1.73 1.37 1.96 1.96 0.98
## [1398] 1.96 1.92 2.00 1.02 1.01 1.14 0.98 1.00 0.98 0.99 0.99
## [1409] 0.98 0.99 0.98 1.22 2.60 2.23 2.15 2.15 2.68 2.81 2.64
## [1420] 2.23 1.07 2.73 3.00 1.11 5.44 3.88 3.75 3.49 3.67 3.28
## [1431] 3.36 1.01 1.01 1.02 1.01 2.50 2.23 2.00 2.30 2.00 1.41
## [1442] 1.00 1.02 2.62 3.06 2.64 2.68 1.47 1.00 3.49 0.97 1.22
## [1453] 3.89 4.90 3.00 3.19 3.00 3.49 2.91 2.00 2.62 3.00 4.43
## [1464] 3.00 3.00 1.00 1.00 1.00 1.00 1.00 0.98 1.00 0.98 0.99
## [1475] 1.00 1.00 1.00 1.00 1.00 0.99 1.00 1.00 1.22 2.76 2.50
## [1486] 1.94 1.00 1.00 1.00 4.00 3.80 1.00 1.01 1.00 5.17 3.35
## [1497] 5.49 0.98 3.00 3.00 0.98 1.00 1.01 1.22 0.98 1.00 1.00
## [1508] 1.01 1.00 1.00 1.00 1.00 0.99 1.00 6.75 5.75 2.00 1.83
## [1519] 2.00 0.98 1.00 10.00 9.85 3.70 4.58 3.45 4.45 6.40 2.47
## [1530] 2.15 3.09 8.99 5.37 4.37 4.23 5.88 2.15 5.99 3.65 3.87
## [1541] 5.34 2.15 2.15 1.50 3.49 0.98 1.00 3.40 3.40 3.40 2.21
## [1552] 3.88 3.76 3.11 3.88 2.99 4.18 2.91 3.94 3.88 3.49 5.28
## [1563] 0.97 0.97 1.00 0.99 0.98 0.99 1.00 1.96 1.00 0.98 0.99
## [1574] 0.99 1.00 1.00 0.99 1.00 0.99 0.98 0.98 1.00 1.00 0.99
## [1585] 0.99 0.98 0.98 1.00 0.98 0.98 1.00 1.00 1.00 0.99 1.00
## [1596] 1.39 1.97 1.84 2.00 2.02 1.93 1.82 1.90 1.50 1.02 3.24
## [1607] 2.91 3.80 4.50 3.88 5.40 6.71 5.71 2.69 7.52 3.91 8.70
## [1618] 5.24 5.84 7.50 6.48 3.54 5.62 5.75 1.01 1.00 1.02 1.02
## [1629] 3.00 2.59 3.89 3.49 3.24 3.63 1.94 3.20 3.00 4.00 3.09
## [1640] 1.94 2.98 2.66 2.91 1.90 2.65 2.43 2.59 4.99 4.50 3.00
## [1651] 1.00 2.46 3.08 1.96 0.98 0.99 2.00 2.20 1.92 1.50 1.38
## [1662] 1.50 2.00 1.41 1.08 1.85 2.90 6.90 1.02 5.29 3.49 4.37
## [1673] 5.18 7.50 5.44 3.00 1.02 1.50 1.64 5.95 6.56 6.69 7.24
## [1684] 5.49 6.49 4.94 4.99 5.99 4.18 5.49 5.33 4.58 4.89 7.00
## [1695] 5.84 7.09 3.99 5.99 8.99 5.46 4.89 3.14 0.98 1.07 0.99
## [1706] 7.77 8.69 5.99 5.18 4.34 5.91 6.50 7.97 6.50 6.07 7.08
## [1717] 4.89 7.59 6.94 5.49 6.32 8.49 8.08 7.77 4.93 7.88 4.99
## [1728] 0.97 1.00 3.09 2.05 3.00 3.56 1.62 1.70 5.07 4.00 4.62
## [1739] 3.96 4.41 0.99 2.38 0.99 0.99 0.98 0.99 0.99 0.99 0.98
## [1750] 0.98 0.98 0.98 3.49 4.00 2.99 1.89 3.10 3.49 1.00 1.00
## [1761] 1.00 1.00 0.98 0.96 1.44 1.00 1.01 1.02 1.01 0.98 1.00
## [1772] 2.43 2.50 2.71 2.29 3.23 2.46 3.11 1.41 1.49 3.86 6.59
## [1783] 4.58 3.24 6.24 3.66 4.52 5.47 1.00 0.98 0.98 1.78 2.69
## [1794] 1.99 1.02 4.27 3.09 1.08 1.00 1.40 1.41 0.99 1.00 1.11
## [1805] 0.98 4.37 4.59 3.49 3.89 4.50 4.99 2.43 3.88 4.89 4.99
## [1816] 2.38 4.50 4.95 3.49 3.93 4.04 5.93 3.92 4.85 4.37 5.01
## [1827] 4.00 4.14 3.75 6.80 5.49 3.93 2.59 2.38 3.40 1.00 1.00
## [1838] 0.99 1.00 0.99 1.00 0.99 1.02 1.01 1.00 4.86 2.19 3.70
## [1849] 5.34 0.98 1.00 0.98 3.28 4.59 4.86 3.89 3.77 1.00 0.98
## [1860] 1.25 1.00 2.52 4.47 4.99 3.49 3.93 3.44 5.15 1.01 1.00
## [1871] 5.19 4.99 1.01 1.02 1.01 1.42 4.57 5.44 7.77 6.90 0.98
## [1882] 1.00 6.32 9.36 8.00 8.90 1.22 0.99 2.91 0.98 1.00 2.50
## [1893] 0.97 0.99 1.22 0.99 2.82 1.00 1.93 1.22 1.78 2.62 1.47
## [1904] 1.00 1.01 1.02 1.06 1.02 2.46 5.64 4.94 3.88 4.41 3.56
## [1915] 4.47 5.16 5.72 1.00 2.73 2.91 2.00 2.50 1.01 1.00 1.00
## [1926] 1.00 1.00 1.00 1.00 1.00 1.40 2.79 3.00 0.99 0.98 0.99
## [1937] 0.99 0.98 0.98 1.00 1.00 0.98 0.99 0.99 1.00 4.86 4.76
## [1948] 3.21 0.99 1.00 4.99 3.76 4.00 4.47 1.00 2.76 2.00 1.65
## [1959] 3.20 1.33 1.01 4.61 4.99 6.50 3.79 3.49 4.86 6.15 2.20
## [1970] 0.98 3.20 2.91 3.88 0.98 0.98 0.98 1.01 1.50 1.02 1.02
## [1981] 1.01 1.00 1.01 1.00 1.00 1.00 1.00 2.00 3.00 1.37 0.97
## [1992] 1.00 1.64 4.24 2.47 2.91 1.92 2.00 2.12 4.07 3.71 2.17
## [2003] 0.99 0.98 4.30 1.46 3.00 3.49 2.39 2.24 1.66 1.50 0.98
## [2014] 0.98 0.99 0.98 0.98 1.00 4.00 3.24 2.16 1.50 1.21 1.00
## [2025] 1.00 1.50 2.00 1.00 1.92 2.89 3.74 5.07 3.49 2.00 2.50
## [2036] 5.62 3.49 7.65 4.50 4.00 2.12 4.25 4.59 2.36 3.80 1.60
## [2047] 2.00 3.45 2.06 2.00 1.94 1.41 5.36 5.04 5.47 5.75 4.99
## [2058] 4.23 2.76 2.23
##
## The decimal point is at the |
##
## 0 |
## 1 | 00000000000000000000000000000000000000000000000000000000000000000000+937
## 2 | 00000000000000000000000000000000000000000000000000000000000000000000+259
## 3 | 00000000000000000000000000000000000000000000000000000000000000000000+230
## 4 | 00000000000000000000000000000000000000000011111111122222222222222233+103
## 5 | 00000000000000000000000000000111111222222222222223333333333333444444+35
## 6 | 00000000000111122223333444455555555666777788899999
## 7 | 001111235555567888899
## 8 | 000011155557789
## 9 | 0000249
## 10 | 0
## 11 | 23
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.960 1.000 1.970 2.489 3.490 11.300
layout(matrix(1:6, ncol=3, byrow=TRUE), heights=c(1, 1.2))
opar <- par(mar=c(1, 3, 1, 1)+.1)
xh <- hist(px$LitterSize, plot=FALSE, breaks=30)
for (k in 1:nlevels(px$Order)) {
ord <- levels(px$Order)[k]
if (k == 4) par(opar)
with(subset(px, Order == ord),
hist(LitterSize, xlim=c(0, max(px$LitterSize)),
breaks=xh$breaks, main=ord,
xaxt=if (k > 3) 's' else 'n',
xlab=if (k > 3) 'litter size' else '') )
}
overlay_hist <- function (x, f, breaks=30, ...) {
xh <- hist(x, breaks=breaks, plot=FALSE)
ymax <- do.call(max, with(px, lapply(tapply(LitterSize, Order, hist, plot=FALSE), "[[", "counts")))
for (k in 1:nlevels(f)) {
hist(x[f==levels(f)[k]], breaks=xh$breaks, ...,
add=(k>1), col=adjustcolor(k, 0.4), ylim=c(0, ymax))
}
legend("topright", fill=adjustcolor(1:nlevels(f), 0.4),
legend=levels(f))
}
par(mar=c(9, 4, 1, 1) + 0.1)
famsize <- aggregate(LitterSize ~ Order + Family, data=px, mean)
famorder <- rank(with(famsize, LitterSize + 100 * as.numeric(Order)))
with(px, boxplot(LitterSize ~ Family, las=2, xlab='',
col=as.numeric(famsize$Order),
at=famorder))
text(x=tapply(famorder, famsize$Order, mean),
y=10, label=levels(famsize$Order))
Challenge: visualize LitterSize
by TeatNumber
, using a boxplot.
gg
”introduced by Leland Wilkinson
adopted by Hadley Wickham in the ggplot
library
thinks of plots as objects
see this chapter of R for Data Science
data
coordinate axes
a geom
etric representation of numbers
a mapping from (summaries of) variables to properties of the geoms
maybe more plots
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(
mapping = aes(<MAPPINGS>),
stat = <STAT>,
position = <POSITION>
) +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION>
Reference: the ggplot2 book.
Challenge: make this plot.
The cheatsheet might be helpful.