This first function allows you to aggregate data by values of categorical variables
by_day <- group_by(flights, year, month, day)
Once you have done this aggregation, you can then calculate values (in this case the mean) of other variables split by the new aggregated levels of the categorical variable
summarise(by_day, delay = mean(dep_delay, na.rm = TRUE))
- Note - you can get a lot of missing values!
- That’s because aggregation functions obey the usual rule of missing values:
- if there’s any missing value in the input, the output will be a missing value.
- fortunately, all aggregation functions have an na.rm argument which removes the missing values prior to computation