Time Series Toolbox
Birgit Erni
25 April 2019
Click on the following for the slides, code and video of this seminar.
Outline: Time Series in Practice
- When and why do we need time series models?
- Basic models and definitions: white noise, AR1, MA, random walk, stationarity.
- 3 approaches to time series modelling: ARIMA, Regression, Structural time series / state-space models
understand basic difficulties with time series, construct a few simple but useful models
References
-
Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: Principles and Practice, 2nd edition, OTexts: Melbourne, Australia. https://otexts.com/fpp2
-
Applied Time Series Analysis for Fisheries and Environmental Sciences. E. E. Holmes, M. D. Scheuerell, and E. J. Ward. (2019). https://nwfsc-timeseries.github.io/atsa-labs/index.html
More technical:
- Shumway, R.H. and Stoffer, D.S., 2017. Time series analysis and its applications: with R examples. Springer. https://www.stat.pitt.edu/stoffer/tsa4/
Motivating Example: Mauna Loa Atmospheric CO2 Concentration
Shark Attacks in Florida
Source: R package bsts (Bayesian structural time series)
Financial
fpp2 is the package for the book by Hyndman: Forecasting: Principles and Practice, 2nd edition.
Sunspot Area
Electricity Production
Treerings
Electricity Demand
Goals of Time Series Analysis
- prediction / forecast
- impact of single event
- study causal patterns
- detect trends, changes, shifts (in mean, seasonality, variance)
Time Series in R
Time Series in R
Create time series of quarterly data:
ts(rnorm(40), frequency = 4, start = c(2019, 2))
When and why do we need time series models?
When there is auto-correlation in the residuals (after modelling trends, seasonality, effects of explanatory variables).
If we ignore autocorrelation:
-
standard error estimates are wrong
-
predictions and prediction intervals are wrong
Terminology / Definitions
time series: \(y_1, ..., y_t\)
autocorrelation: correlation with previous values,
Basic time series processes
AR process (auto-regressive)
\[\mbox{AR(p):} \qquad x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + ... + \phi_p x_{t-p} + e_t\]
\[\mbox{AR1:} \qquad x_t = \phi x_{t-1} + e_t\]
Why would anything behave like this?
I see \(x_{t-1}\) as a measure of everything that was not measured explicitly at previous time step.
Random walk
\[x_t = x_{t-1} + e_t\]
AR1 processes, different \(\phi\)
Stationarity
- mean, variance, correlations stay constant over time
Why is stationarity important?
There is a single observation per time point. If mean and variance are different for every point, we can’t estimate mean and variance, correlation or model parameters.
AR1 processes are stationary if \(|\phi| < 1\).
non-stationarity means
mean changes, variance changes, seasonality present, correlation changes
MA(q) process (moving average)
\[y_t = \theta_1 e_{t-1} + \theta_2 e_{t-2} + \ldots + \theta_p e_{t-p} + e_t\]
sum of previous shocks / events
White noise
identically, independently distributed, mean 0, no autocorrelation
ACF and PACF
autocorrelation function, partial autocorrelation function
partical autocorrelation is the correlation between \(z_t\) and \(z_{t+k}\) that is not accounted for by lags 1 to k.
Three approaches to time series modelling
1. ARIMA, very briefly
arima(p, d, q)
p = order of autoregressive process, d = difference order, q = order of moving average process
If \(y_t\) is not stationary then \(y_t - y_{t-1}\) sometimes is (first order differences).
ARIMA: auto-regressive, integrated, moving average
‘integrated’ refers to differencing
Three approaches to time series modelling
2. Regression
Ignore or model auto-correlation in errors like this:
\[y_t = \beta_0 + \beta_1 x_t + \nu_t\]
\[\nu_t = \phi \nu_{t-1} + e_t\]
Not this:
\[y_t = \beta_0 + \beta_1 x_t + \phi y_{t-1} + e_t\]
problem: \(\beta_1\) is not change in response per unit change in \(x\)
Simulate some data with autocorrelation, fit regression model
acf2
comes from package astsa
, and skips the auto-correlation at lag 1.
ARMA errors
arima
with xreg
models autocorrelation in errors
a1 <- arima(y, order = c(0, 0, 0), xreg = x)
a2 <- arima(y, order = c(1, 0, 0), xreg = x)
##
## Call:
## arima(x = y, order = c(1, 0, 0), xreg = x)
##
## Coefficients:
## ar1 intercept x
## 0.8238 2.1524 -1.9644
## s.e. 0.0544 0.5223 0.0640
##
## sigma^2 estimated as 0.5757: log likelihood = -114.85, aic = 237.7
Compare to true values: \(\beta_0 = 3, \beta_1 = -2, \phi = 0.9, \sigma^2 = 0.64\)
Regression without ARMA errors
##
## Call:
## arima(x = y, order = c(0, 0, 0), xreg = x)
##
## Coefficients:
## intercept x
## 1.4424 -1.8305
## s.e. 0.7726 0.1494
##
## sigma^2 estimated as 1.842: log likelihood = -172.43, aic = 350.86
Coefficient estimates are further from true values.
ARMA errors
Terminology
fitted values:
\(\hat{y}_t | y_1, y_2, \ldots, y_{t-1}\)
usually one-step-ahead forecast, using model
residuals:
\(y_t - \hat{y}_t\)
forecasts:
\(y_{t+h} | y_1, y_2, \ldots, y_t\)
Point forecasts need PREDICTION INTERVALS
Alternative for regression with ARMA errors: GLS
GLS: generalized least squares: linear regression with correlated errors
library(nlme)
g2 <- gls(y ~ x, correlation = corAR1(form = ~ 1))
In model g2
we have assumed an AR1 process for the errors, i.e. \(e_t = \phi e_{t-1} + w_t, \qquad w_t \sim N(0, \sigma^2)\).
In model g1
we assume independent errors, this would be equivalent to a simple linear regression.
3rd approach: Structural Time Series Models
Trend, season, error are modelled explicitly. Similar to regression. Easier to understand.
\[y_t = T_t + S_t + X_t + R_t\]
\(T_t\) = trend
\(S_t\) = seasonality
\(R_t\) = remainder
\(X_t\) = regression terms
International Encyclopedia of Statistical Science: https://link.springer.com/referenceworkentry/10.1007%2F978-3-642-04898-2_577
decompose
Structural Time Series Models
linear Gaussian state-space models for univariate time series based on a decomposition of the series into components.
Some basic models:
1. Local Level:
\[\mu_t = \mu_{t-1} + \xi_t, \qquad \xi_t \sim N(0, \sigma^2_\xi)\]
with observations:
\[y_t = \mu_t + e_t, \qquad e_t \sim N(0, \sigma^2_e)\]
level / mean is a random walk, errors independent
2. Local Linear Trend:
\[\mu_{t} = \mu_{t-1} + \beta_t + \xi_t, \qquad \xi_t \sim N(0, \sigma^2_\xi)\]
\[\beta_t = \beta_{t-1} + w_t \qquad w_t \sim N(0, \sigma^2_w)\]
dynamic / time-varying trend (change in mean)
observations as before
3. Basic Structural Model
\[y_t = \mu_t + \tau_t + e_t, \qquad e_t \sim N(0, \sigma^2_e)\]
\(\tau_t\) is the seasonal component with dynamics
\[\tau_t = - \sum_{s=1}^{S-1} \tau_{t-s} + w_t, \qquad w_t \sim N(0, \sigma^2_w)\]
These are fixed effects which sum to 0. The added error term allows seasonal effects to change over time.
or
\[\tau_t = \alpha_1 cos(\frac{2 \pi t}{\omega}) + \alpha_2 sin(\frac{2 \pi t}{\omega})\]
State-space model
Same as a structural model: state variables are modelled, observations given state are independent.
\(\mu\) and \(\delta\) are the state variables, which can change over time. \(\mu\) is the mean, \(\delta\) is the change in mean.
\[y_t = \underbrace{\mu_t}_\text{trend} + \underbrace{\tau_t}_\text{seasonal} + \underbrace{X_t\beta_t}_\text{regression} + e_t \]
\[\mu_t = \mu_{t-1} + \underbrace{\delta_t}_\text{slope} + u_t\]
\[\delta_t = \delta_{t-1} + w_t \]
\[e_t, u_t, w_t\] independent, (Gaussian) white noise
Fitting State-space models using JAGS
The following code should be copied to a file called ssm.jags
.
## JAGS model for simple state-space model
## local level
model {
## Prior for error of observation process
sigma.obs ~ dunif(0, 10)
tau.obs <- pow(sigma.obs, -2)
## Prior for error of state process
sigma.proc ~ dunif(0, 10)
tau.proc <- pow(sigma.proc, -2)
## State process
mu[1] ~ dnorm(y0, 0.001)
for (t in 2:T) {
r[t] ~ dnorm(0, tau.proc)
mu[t] <- mu[t-1] + r[t]
}
## Observation process / model / likelihood
for (t in 1:T) {
y[t] ~ dnorm(mu[t], tau.obs)
}
}
Link to Henning Winker’s slides: https://maialesosky.files.wordpress.com/2016/02/bayesian-state-space-model-applications-for-time-series-analysis-1.pdf. He did 2 SEEC toolbox seminars on state-space models using JAGS.
JAGS: electricity demand, local level state-space model
In JAGS one can obtain forecasts by adding NAs to the response vector. Because the above model has not included weekly effects or temperature effects, the forecasts are terrible.
JAGS: state-space model with regression terms
Copy the following code into a file called ssm2.jags
.
In this model I have added a quadratic effect for temperature and a fixed effect for weekday. The only things that change are the model and we need to add priors for the beta coefficients. The level now estimates long term changes in the mean not explained by temperature or weekday.
There is still some pattern left, the residuals at 7-day intervals are positively related. This suggests that we change the weekday-weekend effect to an effect for every day of the week.
The forecasts are only shown for the level, but should be done for the fitted values (estimated response). For this we need estimates for temperature and weekday. Forecasting is a bit more difficult when we have regression terms in the model.