Cheat Sheet R



Getting started using the forecast package for time series data in R, as quickly as possible and no explanations.

R for Data Science Cheat Sheet. In Data Science, you have to perform statistical analysis, and for that R is much better than Python. R has approximately 12000 packages, R has a huge variety of libraries to perform statistical analysis.

Cheat Sheet Recipes Easy Dinners

Source: Forecasting: Principles and Practice

Coerce your data to ts format:

R is more than just a statistical programming language. It’s also a powerful tool for all kinds of data processing and manipulation, used by a community of programmers and users, academics, and practitioners. But in order to get the most out of R, you need to know how to access the R Help files and. DataCamp’s Python Pandas cheat sheet; Cheat sheets for R: The R's ecosystem has been expanding so much that a lot of referencing is needed. The R Reference Card covers most of the R world in few pages. The Rstudio has also published a series of cheat sheets to make it easier for the R community. The base R gsub function searches for a regular expression in a string and replaces it. The function recieve a string or character to replace, a replacement value, and the object that contains the regular expression.

  • autoplot(): Useful function to plot data and forecasts

Seasonality

Markdown
  • ggseasonplot(): Create a seasonal plot
  • ggsubseriesplot(): Create mini plots for each season and show seasonal means

Lags and ACF

  • gglagplot(): Plot the time series against lags of itself
  • ggAcf(): Plot the autocorrelation function (ACF)

White Noise and the Ljung-Box Test

White Noise is another name for a time series of iid data. Purely random. Ideally your model residuals should look like white noise.

You can use the Ljung-Box test to check if a time series is white noise, here’s an example with 24 lags:

p-value > 0.05 suggests data are not significantly different than white noise

The forecast package includes a few common models out of the box. Fit the model and create a forecast object, and then use the forecast() function on the object and a number of h periods to predict.

Example of the workflow:

Naive Models

Useful to benchmark against naive and seasonal naive models.

  • naive()
  • snaive()

Residuals

Sheet

Residuals are the difference between the model’s fitted values and the actual data. Residuals should look like white noise and be:

  • Uncorrelated
  • Have mean zero

And ideally have:

Cheat Sheet R
  • Constant variance
  • A normal distribution

checkresiduals(): helper function to plot the residuals, plot the ACF and histogram, and do a Ljung-Box test on the residuals.

Evaluating Model Accuracy

R Studio Stats Cheat Sheet

Train/Test split with window function:

window(data, start, end): to slice the ts data

Cheat Sheet Re

Use accuracy() on the model and test set

accuracy(model, testset): Provides accuracy measures like MAE, MSE, MAPE, RMSE etc

Backtesting with one step ahead forecasts, aka “Time series cross validation” can be done with a helper function tsCV().

tsCV(): returns forecast errors given a forecastfunction that returns a forecast object and number of steps ahead h. At h = 1 the forecast errors will just be the model residuals.

Here’s an example using the naive() model, forecasting one period ahead:

Exponential Models

  • ses(): Simple Exponential Smoothing, implement a smoothing parameter alpha on previous data
  • holt(): Holt’s linear trend, SES + trend parameter. Use damped=TRUE for damped trending
  • hw(): Holt-Winters method, incorporates linear trend and seasonality. Set seasonal=”additive” for additive version or “multiplicative” for multiplicative version

ETS Models

The forecast package includes a function ets() for your exponential smoothing models. ets() estimates parameters using the likelihood of the data arising from the model, and selects the best model using corrected AIC (AICc) * Error = {A, M} * Trend = {N, A, Ad} * Seasonal = {N, A, M}

Transformations

May need to transform the data if it is non-stationary to improve your model prediction. To deal with non-constant variance, you can use a Box-Cox transformation.

BoxCox(): Box-Cox uses a lambda parameter between -1 and 1 to stabilize the variance. A lambda of 0 performs a natural log, 1/3 does a cube root, etc while 1 does nothing and -1 performs an inverse transformation.

Differencing is another transformation that uses differences between observations to model changes rather than the observations themselves.

ARIMA

Cheat sheet regex

Parameters: (p,d,q)(P,D,Q)m

ParameterDescription
p# of autoregression lags
d# of lag-1 differences
q# of Moving Average lags
P# of seasonal AR lags
D# of seasonal differences
Q# of seasonal MA lags
m# of observations per year

Arima(): Implementation of the ARIMA function, set include.constant = TRUE to include drift aka the constant

auto.arima(): Automatic implentation of the ARIMA function in forecast. Estimates parameters using maximum likelihood and does a stepwise search between a subset of all possible models. Can take a lambda argument to fit the model to transformed data and the forecasts will be back-transformed onto the original scale. Turn stepwise = FALSE to consider more models at the expense of more time.

Dynamic Regression

Regression model with non-seasonal ARIMA errors, i.e. we allow e_t to be an ARIMA process rather than white noise.

Usage example:

Dynamic Harmonic Regression

Cheat Sheet Robert's Rules For Meetings

Dynamic Regression with K fourier terms to model seasonality. With higher K the model becomes more flexible.

Pro: Allows for any length seasonality, but assumes seasonal pattern is unchanging. Arima() and auto.arima() may run out of memory at large seasonal periods (i.e. >200).

Cheat Sheet Reading A Tape Measure Worksheet

Cheat Sheet R

TBATS

Automated model that combines exponential smoothing, Box-Cox transformations, and Fourier terms. Pro: Automated, allows for complex seasonality that changes over time.Cons: Slow.

  • T: Trigonemtric terms for seasonality
  • B: Box-Cox transformations for heterogeneity
  • A: ARMA errors for short term dynamics
  • T: Trend (possibly damped)
  • S: Seasonal (including multiple and non-integer periods)