Getting started using the forecast
package for time series data in R, as quickly as possible and no explanations.
- Cheat Sheet Recipes Easy Dinners
- R Studio Stats Cheat Sheet
- Cheat Sheet Re
- Cheat Sheet Robert's Rules For Meetings
- Cheat Sheet Reading A Tape Measure Worksheet
R for Data Science Cheat Sheet. In Data Science, you have to perform statistical analysis, and for that R is much better than Python. R has approximately 12000 packages, R has a huge variety of libraries to perform statistical analysis.
Cheat Sheet Recipes Easy Dinners
Source: Forecasting: Principles and Practice
Coerce your data to ts
format:
R is more than just a statistical programming language. It’s also a powerful tool for all kinds of data processing and manipulation, used by a community of programmers and users, academics, and practitioners. But in order to get the most out of R, you need to know how to access the R Help files and. DataCamp’s Python Pandas cheat sheet; Cheat sheets for R: The R's ecosystem has been expanding so much that a lot of referencing is needed. The R Reference Card covers most of the R world in few pages. The Rstudio has also published a series of cheat sheets to make it easier for the R community. The base R gsub function searches for a regular expression in a string and replaces it. The function recieve a string or character to replace, a replacement value, and the object that contains the regular expression.
autoplot()
: Useful function to plot data and forecasts
Seasonality
ggseasonplot()
: Create a seasonal plotggsubseriesplot()
: Create mini plots for each season and show seasonal means
Lags and ACF
gglagplot()
: Plot the time series against lags of itselfggAcf()
: Plot the autocorrelation function (ACF)
White Noise and the Ljung-Box Test
White Noise is another name for a time series of iid data. Purely random. Ideally your model residuals should look like white noise.
You can use the Ljung-Box test to check if a time series is white noise, here’s an example with 24 lags:
p-value > 0.05 suggests data are not significantly different than white noise
The forecast
package includes a few common models out of the box. Fit the model and create a forecast
object, and then use the forecast()
function on the object and a number of h
periods to predict.
Example of the workflow:
Naive Models
Useful to benchmark against naive and seasonal naive models.
naive()
snaive()
Residuals
Residuals are the difference between the model’s fitted values and the actual data. Residuals should look like white noise and be:
- Uncorrelated
- Have mean zero
And ideally have:
- Constant variance
- A normal distribution
checkresiduals()
: helper function to plot the residuals, plot the ACF and histogram, and do a Ljung-Box test on the residuals.
Evaluating Model Accuracy
R Studio Stats Cheat Sheet
Train/Test split with window function:
window(data, start, end)
: to slice the ts
data
Cheat Sheet Re
Use accuracy()
on the model and test set
accuracy(model, testset)
: Provides accuracy measures like MAE, MSE, MAPE, RMSE etc
Backtesting with one step ahead forecasts, aka “Time series cross validation” can be done with a helper function tsCV()
.
tsCV()
: returns forecast errors given a forecastfunction
that returns a forecast
object and number of steps ahead h
. At h
= 1 the forecast errors will just be the model residuals.
Here’s an example using the naive()
model, forecasting one period ahead:
Exponential Models
ses()
: Simple Exponential Smoothing, implement a smoothing parameter alpha on previous dataholt()
: Holt’s linear trend, SES + trend parameter. Usedamped
=TRUE for damped trendinghw()
: Holt-Winters method, incorporates linear trend and seasonality. Setseasonal
=”additive” for additive version or “multiplicative” for multiplicative version
ETS Models
The forecast
package includes a function ets()
for your exponential smoothing models. ets()
estimates parameters using the likelihood of the data arising from the model, and selects the best model using corrected AIC (AICc) * Error = {A, M} * Trend = {N, A, Ad} * Seasonal = {N, A, M}
Transformations
May need to transform the data if it is non-stationary to improve your model prediction. To deal with non-constant variance, you can use a Box-Cox transformation.
BoxCox()
: Box-Cox uses a lambda
parameter between -1 and 1 to stabilize the variance. A lambda
of 0 performs a natural log, 1/3 does a cube root, etc while 1 does nothing and -1 performs an inverse transformation.
Differencing is another transformation that uses differences between observations to model changes rather than the observations themselves.
ARIMA
Parameters: (p,d,q)(P,D,Q)m
Parameter | Description |
---|---|
p | # of autoregression lags |
d | # of lag-1 differences |
q | # of Moving Average lags |
P | # of seasonal AR lags |
D | # of seasonal differences |
Q | # of seasonal MA lags |
m | # of observations per year |
Arima()
: Implementation of the ARIMA function, set include.constant
= TRUE to include drift aka the constant
auto.arima()
: Automatic implentation of the ARIMA function in forecast
. Estimates parameters using maximum likelihood and does a stepwise search between a subset of all possible models. Can take a lambda
argument to fit the model to transformed data and the forecasts will be back-transformed onto the original scale. Turn stepwise
= FALSE to consider more models at the expense of more time.
Dynamic Regression
Regression model with non-seasonal ARIMA errors, i.e. we allow e_t to be an ARIMA process rather than white noise.
Usage example:
Dynamic Harmonic Regression
Cheat Sheet Robert's Rules For Meetings
Dynamic Regression with K
fourier terms to model seasonality. With higher K
the model becomes more flexible.
Pro: Allows for any length seasonality, but assumes seasonal pattern is unchanging. Arima()
and auto.arima()
may run out of memory at large seasonal periods (i.e. >200).
Cheat Sheet Reading A Tape Measure Worksheet
TBATS
Automated model that combines exponential smoothing, Box-Cox transformations, and Fourier terms. Pro: Automated, allows for complex seasonality that changes over time.Cons: Slow.
- T: Trigonemtric terms for seasonality
- B: Box-Cox transformations for heterogeneity
- A: ARMA errors for short term dynamics
- T: Trend (possibly damped)
- S: Seasonal (including multiple and non-integer periods)