\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Time series

Peter Ralph

14 January 2020 – Advanced Biological Statistics

Time series data

Time series

A time series is a sequence of observations \[\begin{aligned} (y_1, y_2, \ldots, y_N) , \end{aligned}\] that were taken at some set of times \[\begin{aligned} t_1 < t_2 < \cdots < t_N . \end{aligned}\]

Goals might be to:

  • describe patterns
  • discover trends
  • quantify correlations
  • predict future values

The thing about time

Many statistical methods assume independence of observations.

Nearby observations in a time series are definately not.

Solutions: model the correlation; subtract it off.

An oscillator

A discrete, noisy oscillator

Suppose we have regular, noisy observations from a discrete, noisy oscillator.

The system itself does \[\begin{aligned} x_{t+1} - x_t &= \alpha y_t + \Normal(0, \sigma_{xy}) \\ y_{t+1} - y_t &= - \beta x_t + \Normal(0, \sigma_{xy}) \end{aligned}\] but we only get to observe \[\begin{aligned} X_t &= x_t + \Normal(0, \sigma_\epsilon) \\ Y_t &= y_t + \Normal(0, \sigma_\epsilon) . \end{aligned}\]

Here’s what this looks like.

plot of chunk plot_osc

A Stan block

## Warning: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: The largest R-hat is 1.45, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess

How’d we do?

##           truth       mean      se_mean           sd       2.5%        25%        50%        75%      97.5%       n_eff      Rhat
## alpha      0.10 0.09976285 6.021567e-05 0.0017514028 0.09629297 0.09855202 0.09972681 0.10097621 0.10309224  845.966230 1.0006770
## beta       0.05 0.04989406 3.039775e-05 0.0008868576 0.04824137 0.04926927 0.04986840 0.05049346 0.05172836  851.186636 1.0009753
## sigma_xy   0.01 0.01605386 1.171786e-03 0.0036154651 0.01104737 0.01332916 0.01554787 0.01780032 0.02528952    9.519889 1.4247695
## sigma_eps  1.00 0.99157676 5.863422e-04 0.0220641441 0.95013703 0.97525310 0.99101743 1.00719460 1.03560456 1416.028210 0.9987841

Here is a density plot of 100 estimated trajectories (of x and y) from the Stan fit.

plot of chunk show_osc_fit

A noisier oscillator

More realism?

Let’s try that again, with more noise.

Here’s what this looks like.

plot of chunk plot_osc2

## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess

How’d we do?

##           truth       mean      se_mean          sd       2.5%        25%        50%        75%      97.5%    n_eff      Rhat
## alpha      0.10 0.09951669 3.934835e-05 0.001594103 0.09637761 0.09842452 0.09954156 0.10058203 0.10263996 1641.268 0.9993526
## beta       0.05 0.04898589 2.859515e-05 0.001144167 0.04670660 0.04822168 0.04899289 0.04975163 0.05129391 1601.010 1.0010777
## sigma_xy   2.00 2.04486066 9.993463e-03 0.130059864 1.80366354 1.95971901 2.03881715 2.12445243 2.30083361  169.377 1.0005429
## sigma_eps  4.00 3.75648565 3.566001e-03 0.115847651 3.52383531 3.67657052 3.75798270 3.83640882 3.97456349 1055.386 0.9983640

Here is a density plot of 100 estimated trajectories (of x and y) from the Stan fit.

plot of chunk show_osc_fit2

Missing data

Even more realism?

Now what if we actually don’t observe most of the \(Y\) values?

Here’s what this looks like.

plot of chunk plot_osc3

A new Stan block

## Warning: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: The largest R-hat is 1.31, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess

How’d we do?

##           truth       mean      se_mean          sd       2.5%        25%        50%        75%      97.5%      n_eff     Rhat
## alpha      0.10 0.09863762 0.0002607767 0.003744194 0.09199764 0.09601437 0.09837469 0.10095379 0.10683077  206.14798 1.011433
## beta       0.05 0.05016057 0.0001329084 0.001892851 0.04639661 0.04898090 0.05023687 0.05146732 0.05376685  202.82802 1.013808
## sigma_xy   0.05 0.04134701 0.0013517795 0.006084007 0.03134862 0.03669677 0.04096572 0.04545947 0.05404657   20.25665 1.307217
## sigma_eps  0.50 0.49036338 0.0005004922 0.017022319 0.45726336 0.47920755 0.48966945 0.50093498 0.52475828 1156.75896 1.001548

Here is a density plot of 100 estimated trajectories (of x and y) from the Stan fit.

plot of chunk show_osc_fit3

CO2 concentrations

Atmospheric CO2 concentrations at Mauna Loa

plot of CO2 concentrations on NOAA’s site
plot of CO2 concentrations on NOAA’s site

Monthly averages

# The "average" column contains the monthly mean CO2 mole fraction determined
# from daily averages.  The mole fraction of CO2, expressed as parts per million
# (ppm) is the number of molecules of CO2 in every one million molecules of dried
# air (water vapor removed).  If there are missing days concentrated either early
# or late in the month, the monthly mean is corrected to the middle of the month
# using the average seasonal cycle.  Missing months are denoted by -99.99.
# The "interpolated" column includes average values from the preceding column
# and interpolated values where data are missing.  Interpolated values are
# computed in two steps.  First, we compute for each month the average seasonal
# cycle in a 7-year window around each monthly value.  In this way the seasonal
# cycle is allowed to change slowly over time.  We then determine the "trend"
# value for each month by removing the seasonal cycle; this result is shown in
# the "trend" column.  Trend values are linearly interpolated for missing months.
# The interpolated monthly mean is then the sum of the average seasonal cycle
# value and the trend value for the missing month.

The data

##   year month decimal_date average interpolated  trend num_days
## 1 1958     3     1958.208  315.71       315.71 314.62       -1
## 2 1958     4     1958.292  317.45       317.45 315.29       -1
## 3 1958     5     1958.375  317.50       317.50 314.71       -1
## 4 1958     6     1958.458      NA       317.10 314.85       -1
## 5 1958     7     1958.542  315.86       315.86 314.98       -1
## 6 1958     8     1958.625  314.93       314.93 315.94       -1

plot of chunk plot_co2

Trend plus seasonal

plot of chunk trend_co2

Seasonal

plot of chunk plot_seasonal

In class

plot of chunk do_interp

plot of chunk do_interp

plot of chunk do_interp

plot of chunk do_interp

plot of chunk do_interp

plot of chunk do_interp

plot of chunk do_interp

Interlude: Correlation

xkcd/552
xkcd/552

Pearson’s Product-Moment Correlation Coefficent

(a.k.a. “the correlation”)

\[\begin{aligned} r = \cor[x,y] = \frac{1}{n-1} \sum_{i=1}^n \left(\frac{ x_i - \bar x }{\sd[x]}\right) \left(\frac{ y_i - \bar y }{\sd[y]}\right) \end{aligned}\] where \[\begin{aligned} \bar x = \text{mean}[x] &= \frac{1}{n} \sum_{i=1}^n x_i \\ s_x = \sd[x] &= \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar x)^2} \end{aligned}\]

Guess the correlation?

plot of chunk guess_cors

Guess the correlation?

plot of chunk guess_cors2

> help(cor)

Correlation, Variance and Covariance (Matrices)

Description:

     ‘var’, ‘cov’ and ‘cor’ compute the variance of ‘x’ and the
     covariance or correlation of ‘x’ and ‘y’ if these are vectors.  If
     ‘x’ and ‘y’ are matrices then the covariances (or correlations)
     between the columns of ‘x’ and the columns of ‘y’ are computed.

     ‘cov2cor’ scales a covariance matrix into the corresponding
     correlation matrix _efficiently_.

Usage:

     var(x, y = NULL, na.rm = FALSE, use)
     
     cov(x, y = NULL, use = "everything",
         method = c("pearson", "kendall", "spearman"))

Rank-based statistics

Spearman’s correlation coefficient (or, “Spearman’s rho”)

is just “the” correlation coefficient between the ranks of the vectors.

It is nonparametric - it makes sense regardless of the data’s distribution.

Autocorrelation

The autocorrelation function of a time series \((x(t) : 0 \le t \le L)\) is the correlation between points as a function of lag:

\[\begin{aligned} \rho(\ell) &= \cor[x(T), x(T+\ell)], \end{aligned}\]

where \(T\) is uniform between 0 and \(L-\ell\).

\[\begin{aligned} \rho(\ell) &= \frac{1}{T-\ell-1} \sum_{t=1}^{T-\ell} \left(\frac{ x(t) - \bar x }{ \sd[x] } \right) \left(\frac{ x(t + \ell) - \bar x }{ \sd[x] } \right) . \end{aligned}\]

Computing autocorrelation:

The acf( ) function requires a time series object:

plot of chunk co2_ts

Seasonal autocorrelation

plot of chunk seasonal_acf

Smoothing

loess

loess soil
loess soil

loess

> help(loess)

Local Polynomial Regression Fitting

Description:

     Fit a polynomial surface determined by one or more numerical
     predictors, using local fitting.

Usage:

     loess(formula, data, weights, subset, na.action, model = FALSE,
           span = 0.75, enp.target, degree = 2,
           parametric = FALSE, drop.square = FALSE, normalize = TRUE,
           family = c("gaussian", "symmetric"),
           method = c("loess", "model.frame"),
           control = loess.control(...), ...)

Example

plot of chunk loess

plot of chunk loess2

A key parameter: span

plot of chunk loess3

Ocean temperatures

Ocean temperature is available from NOAA buoys at tidesandcurrents.noaa.gov:

tidesandcurrents.noaa.gov
tidesandcurrents.noaa.gov

Ten years of data for Newport, OR are in the Datasets/ directory

buoy location
buoy location

Reading in dates:

plot of chunk load_ocean

2012:

plot of chunk twentytwelve

July, 2012:

plot of chunk july_twentytwelve

Autocorrelation

plot of chunk ocean_acf0

Removing the trend

plot of chunk smooth1

plot of chunk smooth2

Remaining autocorrelation

plot of chunk ocean_acf

A 12 hour cycle?

plot of chunk ocean_acf2

A 12 * (1 + 1/27.2) hour cycle!

plot of chunk ocean_acf3

… why?

Addendum

Spectra

plot of chunk thespec