This is the last in my series of posts on forecasting. The posts have focused on the ‘why’ of forecasting and also some of the practicalities of forecasting. This last post is going to be shorter. It is simply some links to forecasting resources that I’ve found useful or think could be useful – books, articles, online tutorials and software, as well as my opinions on which bits you should focus on learning.
Forecasting: Methods and Applications by Spyros Makridakis, Steven Wheelwright and Rob Hyndman. This was the book that was recommended to me by a colleague when I started in commercial forecasting. Rob Hyndman (one of the authors) says it is out of date, and recommends his later textbook (see next), but I still find it useful.
- Forecasting: Principles and Practice by Rob Hyndman and George Athanasopoulos is considered one of the modern bibles on classical forecasting techniques. It is now in its 3rd edition and also available online for free.
Introductory Time Series with R by Paul Cowpertwait and Andrew Metcalfe. I found this short but concise Springer book (in the Use R! series) on classic time-series analysis in R a great help. It was useful both from an R perspective, but also for short practical introductions and explanations of the various ARIMA concepts. Some of the links to the datasets used are now broken apparently, but I have seen comments that the resources are not hard to find with a google search.
- This recent and comprehensive review article in the International Journal of Forecasting is great (arxiv version here). It has short readable paragraphs and sections on a large number of concepts and forecasting topics, so you can simply pick the topic you’re interested in and read just that. Or read the whole article end-to-end if you want.
Rob Hyndman’s blog is the main blog I tend to routinely look at. It is always an excellent read and contains links to blogs that Hyndman recommends (although these tend to be more econometrics and statistics focused).
I’m only going to give links to free open-source software. There are some other excellent commercial applications available, but not everyone will be able to get access to them, so I won’t list them.
- R: I have tended to do most of my classical time-series analysis in R. The in-built arima functions and also the forecast package (created by Rob Hyndman and Yeasmin Khandakar) provide a great deal of functionality and are my go-to packages/functions in R for time-series.
- The statsmodels package in python provides a similar model building experience to building models in R. Consequently, its time-series functionality provides similar capabilities to that found in R.
- Darts package in python: I have done less time-series analysis in python than I have in R. When I have done exploratory time-series analysis in python I have tended to use statsmodels. Having said that, the Darts package, and the Kats package (from Facebook) look like useful python packages from the bits I have read.
- Prophet package: The Prophet package, from Facebook, is open-sourced, well used, flexible and very powerful. I have used it for a couple of tasks and like it. Under the hood it is based upon the Stan probabilistic programming language (PPL), which I have used a lot (both in and outside of my main employment). Prophet is fully automated but I would still recommend you have a basic grasp of classical time-series analysis concepts before you use Prophet, to guard against those situations where a fitted model is inappropriate or clearly wrong.
- The engineering team at Uber have also released their own forecasting package, Orbit, which performs Bayesian forecasts using various PPLs under the hood (similar to the way the Prophet package uses Stan).
Methods/Concepts/Techniques you should know about
ARIMA: You should definitely become familiar with the classical approaches to time-series analysis, namely ARIMA. The name is a combination of acronyms, and I’ve given a breakdown of the full acronym and what I think are the important aspects to know about.
- AR: Auto-Regressive. These are the ‘lag’ terms in the time-series model equation, whereby the response variable value at timepoint t can depend on the value of the response variable at previous time-points. It is important to understand, i) how the value of the lag coefficients affect the long-run mean and variance of the response variable, including how this determines whether a process is stationary or not, ii) how to determine the order of the AR terms, e.g. by looking at a Partial Auto-Correlation Function (PACF) plot, iii) how the AR terms are infinite impulse response (IIR) terms, in contrast to the finite impulse response (FIR) moving average terms.
- I: Integrated. This is the concept within ARIMA that most Data Scientists are least familiar with but a very important one, particularly when dealing with quantities that we naturally expect to grow over time, for example by accumulating increments that increase on average. It is important to understand, i) how to run unit-root tests to test for integrated series – beware the difference between Phillips-Perron (PP) and KPSS tests, as the null hypothesis is different between them, ii) Cointegration and spurious regression (spurious correlation) – for which Clive Granger won a Nobel memorial prize in Economics (shared with Robert Engle) in 2003.
- MA: Moving Average. These are the ‘error’ terms in the time-series model equation, whereby the response variable value at timepoint t can be affected by the stochastic error not just at t, but at previous timepoints as well. This allows the response variable to be affected by short timescale perturbations of finite duration (hence the classification of the moving average terms as being finite impulse response terms).
Even if you intend to use only neural network approaches to forecasting, or want to use, say, the Prophet package as an AutoML forecasting solution, it is still a good idea to get a good grasp of ARIMA models. Investing some time in getting to grips with the basics of ARIMA and doing some hands-on playing with ARIMA models will pay huge dividends.
- Error Correction Models (ECMs). You may never have need to use Error Correction Models, but they are useful for incorporating the transient departures from a long-term equilibrium. I found these University of Oxford summer school lectures notes given by Prof. Robin Best from Binghamton University (SUNY) a very good and accessible introduction to error correction models. The lecture notes also give excellent explanations of the concepts of integration and co-integration in time-series analysis. I used these lecture notes when I was having to develop an ECM for a long-range stress-testing model.
Neural Network and other Machine Learning techniques: Neural networks have been applied to time-series for a long time, but being blunt, until the last 7 years or so they weren’t very good and didn’t outperform the classical time series approaches (in my opinion). In part that was probably because machine learning practitioners tended to view time-series analysis and forecasting as ‘just another prediction problem’, and so the approaches didn’t really take into account the time-series nature of the data, e.g. the auto-regressive structure, the very things that make time-series what they are. Coupled with the fact that the classical time-series analysis approaches have a very solid theoretical underpinning in ARIMA (expressed in terms of the lag or backshift operator), this meant that machine learning approaches didn’t make as many inroads into time-series analysis as they did in other fields. However, with the advent of Deep Learning approaches using Recurrent Neural Networks and LSTM units, machine learning approaches to time-series analysis have really begun to make their mark. Models such as the DeepAR model from Salinas et al are now considered to outperform classical approaches for specific tasks. Hyndman’s book, “Forecasting: Principles and Practice” contains a chapter on machine learning approaches to time-series analysis, but in my opinion it is only very basic. The extensive review article by Petropoulos et al has a section on ‘Data-Driven Methods’ that includes sub-sections on Neural Networks and also Deep Probabilistic Forecasting Models. However, given the comprehensive nature of the whole review article these sub-sections are necessarily short. Other more extensive resources that I have found useful, which also cover the Deep Learning approaches include,
- This survey paper by Lim and Zohren.
- Lina Faik has an excellent and very up-to-date two-part blog-post on DeepAR and other seq2seq based approaches (Part 1 here, Part 2 here – on Medium) .
- There is a nice introductory TensorFlow tutorial on time-series forecasting with Neural Networks (including notebooks).
- This blog-post from Google on interpretable deep learning for time series forecasting.
That is it. I hope you have enjoyed this series of posts on forecasting. As with anything in Data Science, forecasting isn’t a spectator sport. The best way to learn is to download some datasets and start playing. You will make mistakes, but that is how you learn.
© 2022 David Hoyle. All Rights Reserved.