Peter Ralph
14 January 2021 – Advanced Biological Statistics
# The "average" column contains the monthly mean CO2 mole fraction determined
# from daily averages. The mole fraction of CO2, expressed as parts per million
# (ppm) is the number of molecules of CO2 in every one million molecules of dried
# air (water vapor removed). If there are missing days concentrated either early
# or late in the month, the monthly mean is corrected to the middle of the month
# using the average seasonal cycle. Missing months are denoted by -99.99.
# The "interpolated" column includes average values from the preceding column
# and interpolated values where data are missing. Interpolated values are
# computed in two steps. First, we compute for each month the average seasonal
# cycle in a 7-year window around each monthly value. In this way the seasonal
# cycle is allowed to change slowly over time. We then determine the "trend"
# value for each month by removing the seasonal cycle; this result is shown in
# the "trend" column. Trend values are linearly interpolated for missing months.
# The interpolated monthly mean is then the sum of the average seasonal cycle
# value and the trend value for the missing month.
co2 <- read.table("../Datasets/Mauna_Loa_C02/co2_mm_mlo.txt", comment="#")
names(co2) <- c("year", "month", "decimal_date", "average", "interpolated", "trend", "num_days")
co2[co2 == -99.99] <- NA
head(co2)
## year month decimal_date average interpolated trend num_days
## 1 1958 3 1958.208 315.71 315.71 314.62 -1
## 2 1958 4 1958.292 317.45 317.45 315.29 -1
## 3 1958 5 1958.375 317.50 317.50 314.71 -1
## 4 1958 6 1958.458 NA 317.10 314.85 -1
## 5 1958 7 1958.542 315.86 315.86 314.98 -1
## 6 1958 8 1958.625 314.93 314.93 315.94 -1
The autocorrelation function of a time series \((x(t) : 0 \le t \le L)\) is the correlation between points as a function of lag:
\[\begin{aligned} \rho(\ell) &= \cor[x(T), x(T+\ell)], \end{aligned}\]
where \(T\) is uniform between 0 and \(L-\ell\).
\[\begin{aligned} \rho(\ell) &= \frac{1}{T-\ell-1} \sum_{t=1}^{T-\ell} \left(\frac{ x(t) - \bar x }{ \sd[x] } \right) \left(\frac{ x(t + \ell) - \bar x }{ \sd[x] } \right) . \end{aligned}\]
The acf( )
function requires a time series object:
loess
> help(loess)
Local Polynomial Regression Fitting
Description:
Fit a polynomial surface determined by one or more numerical
predictors, using local fitting.
Usage:
loess(formula, data, weights, subset, na.action, model = FALSE,
span = 0.75, enp.target, degree = 2,
parametric = FALSE, drop.square = FALSE, normalize = TRUE,
family = c("gaussian", "symmetric"),
method = c("loess", "model.frame"),
control = loess.control(...), ...)
span
Ocean temperature is available from NOAA buoys at tidesandcurrents.noaa.gov:
Ten years of data for Newport, OR are in the Datasets/ directory
ocean$numdate <- as.numeric(ocean$date)
month_smooth <- loess(WATER_TEMP_F ~ numdate, data=ocean,
span=sum(ocean$Month < 4 & ocean$Year == 2012)/nrow(ocean))
ocean$month_sm <- predict(month_smooth, newdata=ocean)
plot(WATER_TEMP_F ~ date, data=ocean, pch=20, cex=0.5, ylim=c(35, 65))
lines(month_sm ~ date, data=ocean, col='red', lwd=2)
12 * (1 + 1/27.2)
hour cycle!acf(water, lag.max=10 * 12, xlab='lag (hours)', na.action=na.pass)
abline(v=12*(1 + 1/27.2) * 1:10, col=c('red', 'blue'), lwd=2)
… why?