Autoregression is a time series model that uses observations from previous time steps as input to a regression equation to predict the value at the next time step.
It is a very simple idea that can result in accurate forecasts on a range of time series problems.
In this tutorial, you will discover how to implement an autoregressive model for time series forecasting with python.
After completing this tutorial, you will know:
How to explore your time series data for autocorrelation. How to develop an autocorrelation model and use it to make predictions. How to use a developed autocorrelation model to make rolling predictions.Let’s get started.

Autoregression Models for Time Series Forecasting With Python
Photo by Umberto Salvagnin , some rights reserved.
AutoregressionA regression model, such as linear regression, models an output value based on a linear combination of input values.
For example:
yhat = b0 + b1*X1Where yhat is the prediction, b0 and b1 are coefficients found by optimizing the model on training data, and X is an input value.
This technique can be used on time series where input variables are taken as observations at previous time steps, called lag variables.
For example, we can predict the value for the next time step (t+1) given the observations at the last two time steps (t-1 and t-2). As a regression model, this would look as follows:
X(t+1) = b0 + b1*X(t-1) + b1*X(t-2)Because the regression model uses data from the same input variable at previous time steps, it is referred to as an autoregression (regression of self).
AutocorrelationAn autoregression model makes an assumption that the observations at previous time steps are useful to predict the value at the next time step.
This relationship between variables is called correlation.
If both variables change in the same direction (e.g. go up together or down together), this is called a positive correlation. If the variables move in opposite directions as values change (e.g. one goes up and one goes down), then this is called negative correlation.
We can use statistical measures to calculate the correlation between the output variable and values at previous time steps at various different lags. The stronger the correlation between the output variable and a specific lagged variable, the more weight that autoregression model can put on that variable when modeling.
Again, because the correlation is calculated between the variable and itself at previous time steps, it is called an autocorrelation. It is also called serial correlation because of the sequenced structure of time series data.
The correlation statistics can also help to choose which lag variables will be useful in a model and which will not.
Interestingly, if all lag variables show low or no correlation with the output variable, then it suggests that the time series problem may not be predictable. This can be very useful when getting started on a new dataset.
In this tutorial, we will investigate the autocorrelation of a univariate time series then develop an autoregression model and use it to make predictions.
Before we do that, let’s first review the Minimum Daily Temperatures data that will be used in the examples.
Minimum Daily Temperatures DatasetThis dataset describes the minimum daily temperatures over 10 years (1981-1990) in the city Melbourne, Australia.
The units are in degrees Celsius and there are 3,650 observations. The source of the data is credited as the Australian Bureau of Meteorology.
Learn more about the dataset here .
Download the dataset into your current working directory with the filename “ daily-minimum-temperatures.csv “.
The code below will load the dataset as a Pandas Series.
frompandasimportSeries frommatplotlibimportpyplot series = Series.from_csv('daily-minimum-temperatures.csv', header=0) print(series.head()) series.plot() pyplot.show()Running the exampleprints the first 5 rows from the loaded dataset.
Date 1981-01-01 20.7 1981-01-02 17.9 1981-01-03 18.8 1981-01-04 14.6 1981-01-05 15.8 Name: Temp, dtype: float64A line plot of the dataset is then created.

Minimum Daily Temperature Dataset Plot
Quick Check for AutocorrelationThere is a quick, visual check that we can do to see if there is an autocorrelation in our time series dataset.
We can plot the observation at the previous time step (t-1) with the observation at the next time step (t+1) as a scatter plot.
This could be done manually by first creating a lag version of the time series dataset and using a built-in scatter plot function in the Pandas library.
But there is an easier way.
Pandas provides a built-in plot to do exactly this, called the lag_plot() function.
Below is an example of creating a lag plot of the Minimum Daily Temperatures dataset.
frompandasimportSeries frommatplotlibimportpyplot frompandas.tools.plottingimportlag_plot series = Series.from_csv('daily-minimum-temperatures.csv', header=0) lag_plot(series) pyplot.show()Running the example plots the temperature data (t) on the x-axis against the temperature on the previous day (t-1) on the y-axis.

Minimum Daily Temperature Dataset Lag Plot
We can see a large ball of observations along a diagonal line of the plot. It clearly shows a relationship or some correlation.
This process could be repeated for any other lagged observation, such as if we wanted to review the relationship with the last 7 days or with the same day last month or last year.
Another quick check that we can do is to directly calculate the correlation between the observation and the lag variable.
We can use a statistical test like the Pearson correlation coefficient . This produces a number to summarize how correlated two variables are between -1 (negatively correlated) and +1 (positively correlated) with small values close to zero indicating low correlation and high values above 0.5 or below -0.5 showing high correlation.
Correlation can be calculated easily using the corr() function on the DataFrame of the lagged dataset.
The example below creates a lagged version of the Minimum Daily Temperatures dataset and calculates a correlation matrix of each column with other columns, including itself.
frompandasimportSeries frompandasimportDataFrame frompandasimportconcat frommatplotlibimportpyplot series = Series.from_csv('daily-minimum-temperatures.csv', header=0) values = DataFrame(series.values) dataframe = concat([values.shift(1), values], axis=1) dataframe.columns = ['t-1', 't+1'] result = dataframe.corr() print(result)This is a good confirmation for the plot above.
It