Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

How to Check if Time Series Data is Stationary with Python

$
0
0

Time series is different from more traditional classification and regression predictive modeling problems.

The temporal structure adds an order to the observations. This imposed order means that important assumptions about the consistency of those observations needs to be handled specifically.

For example, when modeling, there are assumptions that the summary statistics of observations are consistent. In time series terminology, we refer to this expectation as the time series being stationary.

These assumptions can be easily violated in time series by the addition of a trend, seasonality, and other time-dependent structures.

In this tutorial, you will discover how to check if your time series is stationary with python.

After completing this tutorial, you will know:

How to identify obvious stationary and non-stationary time series using line plot. How to spot check summary statistics like mean and variance for a change over time. How to use statistical tests with statistical significance to check if a time series is stationary.

Let’s get started.


How to Check if Time Series Data is Stationary with Python

How to Check if Time Series Data is Stationary with Python

Photo by Susanne Nilsson , some rights reserved.

Stationary Time Series

The observations in a stationary time series are not dependent on time.

Time series are stationary if they do not have trend or seasonal effects. Summary statistics calculated on the time series are consistent over time, like the mean or the variance of the observations.

When a time series is stationary, it can be easier to model. Statistical modeling methods assume or require the time series to be stationary to be effective.

Below is an example of the Daily Female Births dataset that is stationary.

frompandasimportSeries frommatplotlibimportpyplot series = Series.from_csv('daily-total-female-births.csv', header=0) series.plot() pyplot.show()

Running the example creates the following plot.


How to Check if Time Series Data is Stationary with Python

Daily Female Births Dataset Plot

Non-Stationary Time Series

Observations from a non-stationary time series show seasonal effects, trends, and other structures that depend on the time index.

Summary statistics like the mean and variance do change over time, providing a drift in the concepts amodel may try to capture.

Classical time series analysis and forecasting methods are concerned with making non-stationary time series data stationary by identifying and removing trends and removing stationary effects.

Below is an example of the Airline Passengers dataset that is non-stationary, showing both trend and seasonal components.

frompandasimportSeries frommatplotlibimportpyplot series = Series.from_csv('international-airline-passengers.csv', header=0) series.plot() pyplot.show()

Running the example creates the following plot.


How to Check if Time Series Data is Stationary with Python

Non-Stationary Airline Passengers Dataset

Types of Stationary Time Series

The notion of stationarity comes from the theoretical study of time series and it is a useful abstraction when forecasting.

There are some finer-grained notions of stationarity that you may come across if you dive deeper into this topic. They are:

They are:

Stationary Process : A process that generates a stationary series of observations. Stationary Model : A model that describes a stationary series of observations. Trend Stationary : A time series that does not exhibit a trend. Seasonal Stationary : A time series that does not exhibit seasonality. Strictly Stationary : A mathematical definition of a stationary process, specifically that the joint distribution of observations is invariant to time shift. Stationary Time Series and Forecasting

Should you make your time series stationary?

Generally, yes.

If you have clear trend and seasonality in your time series, then model these components, remove them from observations, then train models on the residuals.

If we fit a stationary model to data, we assume our data are a realization of a stationary process. So our first step in an analysis should be to check whether there is any evidence of a trend or seasonal effects and, if there is, remove them.

― Page 122, Introductory Time Series with R .

Statistical time series methods and even modern machine learning methods will benefit from the clearer signal in the data.

But…

We turn to machine learning methods when the classical methods fail. When we want more or better results. We cannot know how to best model unknown nonlinear relationships in time series data and some methods may result in better performance when working with non-stationary observations or some mixture of stationary and non-stationary views of the problem.

The suggestion here is to treat properties of a time series being stationary or not as another source of information that can be used in feature engineering and feature selection on your time series problem when using machine learning methods.

Checks for Stationarity

There are many methods to check whether a time series (direct observations, residuals, otherwise) is stationary or non-stationary.

Look at Plots : You can review a time series plot of your data and visually check if there are any obvious trends or seasonality. Summary Statistics : You can review the summary statistics for your data for seasons or random partitions and check for obvious or significant differences. Statistical Tests : You can use statistical tests to check if the expectations of stationarityare met or have been violated.

Above, we have already introduced the Daily Female Births and Airline Passengers datasets as stationary and non-stationary respectively with plots showing an obvious lack and presence of trend and seasonality components.

Next, we will look at a quick and dirty way to calculate and review summary statistics on our time series dataset for checking to see if it is stationary.

Summary Statistics

Aquick and dirty check to see if your time series is non-stationary is to review summary statistics.

You can split your time series into two (or more) partitions and compare the mean and variance of each group. If they differ and the difference is statistically significant, the time series is likely non-stationary.

Next, let’s try this approach on the Daily Births dataset.

Daily Births Dataset

Because we are looking at the mean and variance, we are assuming that the data conforms to a Gaussian (also called the bell curve or normal) distribution.

We can also quickly check this by eyeballing a histogram of our observations.

fro

Viewing all articles
Browse latest Browse all 9596

Trending Articles