Getting started using the forecast
package for time series data in R, as quickly as possible and no explanations.
- Cheat Sheet Recipes Easy Dinners
- R Studio Stats Cheat Sheet
- Cheat Sheet Re
- Cheat Sheet Robert's Rules For Meetings
- Cheat Sheet Reading A Tape Measure Worksheet
R for Data Science Cheat Sheet. In Data Science, you have to perform statistical analysis, and for that R is much better than Python. R has approximately 12000 packages, R has a huge variety of libraries to perform statistical analysis.
Cheat Sheet Recipes Easy Dinners
Source: Forecasting: Principles and Practice
Coerce your data to ts
format:
R is more than just a statistical programming language. It’s also a powerful tool for all kinds of data processing and manipulation, used by a community of programmers and users, academics, and practitioners. But in order to get the most out of R, you need to know how to access the R Help files and. DataCamp’s Python Pandas cheat sheet; Cheat sheets for R: The R's ecosystem has been expanding so much that a lot of referencing is needed. The R Reference Card covers most of the R world in few pages. The Rstudio has also published a series of cheat sheets to make it easier for the R community. The base R gsub function searches for a regular expression in a string and replaces it. The function recieve a string or character to replace, a replacement value, and the object that contains the regular expression.
autoplot()
: Useful function to plot data and forecasts
Seasonality
![Markdown Markdown](https://image.slidesharecdn.com/advancedr-160709152402/95/advanced-r-cheat-sheet-4-638.jpg?cb=1468078026)
ggseasonplot()
: Create a seasonal plotggsubseriesplot()
: Create mini plots for each season and show seasonal means
Lags and ACF
gglagplot()
: Plot the time series against lags of itselfggAcf()
: Plot the autocorrelation function (ACF)
White Noise and the Ljung-Box Test
White Noise is another name for a time series of iid data. Purely random. Ideally your model residuals should look like white noise.
You can use the Ljung-Box test to check if a time series is white noise, here’s an example with 24 lags:
p-value > 0.05 suggests data are not significantly different than white noise
The forecast
package includes a few common models out of the box. Fit the model and create a forecast
object, and then use the forecast()
function on the object and a number of h
periods to predict.
Example of the workflow:
Naive Models
Useful to benchmark against naive and seasonal naive models.
naive()
snaive()
Residuals
![Sheet Sheet](https://intellipaat.com/mediaFiles/2019/02/Data-structure-in-R-Cheat-Sheet-.png)
Residuals are the difference between the model’s fitted values and the actual data. Residuals should look like white noise and be:
- Uncorrelated
- Have mean zero
And ideally have:
![Cheat Sheet R Cheat Sheet R](https://d33wubrfki0l68.cloudfront.net/ad16acdb544c1a9ca00c7dd175312a52f45e8979/7e9a2/wp-content/uploads/2015/01/caret-cheatsheet.png)
- Constant variance
- A normal distribution
checkresiduals()
: helper function to plot the residuals, plot the ACF and histogram, and do a Ljung-Box test on the residuals.
Evaluating Model Accuracy
R Studio Stats Cheat Sheet
Train/Test split with window function:
window(data, start, end)
: to slice the ts
data
Cheat Sheet Re
Use accuracy()
on the model and test set
accuracy(model, testset)
: Provides accuracy measures like MAE, MSE, MAPE, RMSE etc
Backtesting with one step ahead forecasts, aka “Time series cross validation” can be done with a helper function tsCV()
.
tsCV()
: returns forecast errors given a forecastfunction
that returns a forecast
object and number of steps ahead h
. At h
= 1 the forecast errors will just be the model residuals.
Here’s an example using the naive()
model, forecasting one period ahead:
Exponential Models
ses()
: Simple Exponential Smoothing, implement a smoothing parameter alpha on previous dataholt()
: Holt’s linear trend, SES + trend parameter. Usedamped
=TRUE for damped trendinghw()
: Holt-Winters method, incorporates linear trend and seasonality. Setseasonal
=”additive” for additive version or “multiplicative” for multiplicative version
ETS Models
The forecast
package includes a function ets()
for your exponential smoothing models. ets()
estimates parameters using the likelihood of the data arising from the model, and selects the best model using corrected AIC (AICc) * Error = {A, M} * Trend = {N, A, Ad} * Seasonal = {N, A, M}
Transformations
May need to transform the data if it is non-stationary to improve your model prediction. To deal with non-constant variance, you can use a Box-Cox transformation.
BoxCox()
: Box-Cox uses a lambda
parameter between -1 and 1 to stabilize the variance. A lambda
of 0 performs a natural log, 1/3 does a cube root, etc while 1 does nothing and -1 performs an inverse transformation.
Differencing is another transformation that uses differences between observations to model changes rather than the observations themselves.
ARIMA
![Cheat sheet regex Cheat sheet regex](https://i.pinimg.com/originals/c3/6e/8d/c36e8dfb51e7aa396f2b5e0a8f5f8be5.jpg)
Parameters: (p,d,q)(P,D,Q)m
Parameter | Description |
---|---|
p | # of autoregression lags |
d | # of lag-1 differences |
q | # of Moving Average lags |
P | # of seasonal AR lags |
D | # of seasonal differences |
Q | # of seasonal MA lags |
m | # of observations per year |
Arima()
: Implementation of the ARIMA function, set include.constant
= TRUE to include drift aka the constant
auto.arima()
: Automatic implentation of the ARIMA function in forecast
. Estimates parameters using maximum likelihood and does a stepwise search between a subset of all possible models. Can take a lambda
argument to fit the model to transformed data and the forecasts will be back-transformed onto the original scale. Turn stepwise
= FALSE to consider more models at the expense of more time.
Dynamic Regression
Regression model with non-seasonal ARIMA errors, i.e. we allow e_t to be an ARIMA process rather than white noise.
Usage example:
Dynamic Harmonic Regression
Cheat Sheet Robert's Rules For Meetings
Dynamic Regression with K
fourier terms to model seasonality. With higher K
the model becomes more flexible.
Pro: Allows for any length seasonality, but assumes seasonal pattern is unchanging. Arima()
and auto.arima()
may run out of memory at large seasonal periods (i.e. >200).
Cheat Sheet Reading A Tape Measure Worksheet
![Cheat Sheet R Cheat Sheet R](https://images-na.ssl-images-amazon.com/images/I/81a%2Be5iIKJL.__BG0,0,0,0_FMpng_AC_UL600_SR378,600_.jpg)
TBATS
Automated model that combines exponential smoothing, Box-Cox transformations, and Fourier terms. Pro: Automated, allows for complex seasonality that changes over time.Cons: Slow.
- T: Trigonemtric terms for seasonality
- B: Box-Cox transformations for heterogeneity
- A: ARMA errors for short term dynamics
- T: Trend (possibly damped)
- S: Seasonal (including multiple and non-integer periods)
![](https://cdn-ak.f.st-hatena.com/images/fotolife/r/ruriatunifoefec/20200910/20200910011327.png)