You may have observations at the wrong frequency.
Maybe they are too granular or not granular enough. The Pandas library in python provides the capability to change the frequency of your time series data.
In this tutorial, you will discover how to use Pandas in Python to both increase and decrease the sampling frequency of time series data.
After completing this tutorial, you will know:
About time series resampling, the two types of resampling, and the 2 main reasons why you need to use them. How to use Pandas to downsample time series data to a higher frequency and interpolate the new observations. How to use Pandas to upsample time series data to a lower frequency and summarize the higher frequency observations.Let’s get started.

How To Resample and Interpolate Your Time Series Data With Python
Photo by sung ming whang , some rights reserved.
ResamplingResampling involves changing the frequency of your time series observations.
Two types of resampling are:
Downsampling : Where you increase the frequency of the samples, such as from minutes to seconds. Upsampling : Where you decrease the frequency of the samples, such as from days to months.In both cases, data must be invented.
In the case of downsampling, care may be needed in determining how the fine-grained observations are calculated using interpolation. In the case of up-sampling, care may be needed in selecting the summary statistics used to calculate the new aggregated values.
There are perhaps two main reasons why you may be interested in resampling your time series data:
Problem Framing : Resampling may be required if your data is available at the same frequency that you want to make predictions. Feature Engineering : Resampling can also be used to provide additional structure or insight into the learning problem for supervised learning models.There is a lot of overlap between these two cases.
For example, you may have daily data and want to predict a monthly problem. You could use the daily data directly or you could upsample it to monthly data and develop your model.
A feature engineering perspective may use observations and summaries of observations from both time scales and more in developing a model.
Let’s make resampling more concrete by looking at a real dataset and some examples.
Shampoo Sales DatasetThis dataset describes the monthly number of sales of shampoo over a 3 year period.
The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright, and Hyndman (1998).
Below is a sample of the first 5 rows of data, including the header row.
"Month","Sales" "1-01",266.0 "1-02",145.9 "1-03",183.1 "1-04",119.3 "1-05",180.3Below is a plot of the entire dataset taken from Data Market.

Shampoo Sales Dataset
The dataset shows an increasing trend and possibly some seasonal components.
Download and learn more about the dataset here .
Load the Shampoo Sales DatasetDownload the dataset and place it in the current working directory with the filename “ shampoo-sales.csv “.
The timestamps in the dataset do not have an absolute year, but do have a month. We can write a custom date parsing function to load this dataset and pick an arbitrary year, such as 1900, to baseline the years from.
Below is a snippet of code to load the Shampoo Sales dataset using the custom date parsing function from read_csv() .
frompandasimportread_csv frompandasimportdatetime frommatplotlibimportpyplot defparser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) print(series.head()) series.plot() pyplot.show()Running this example loads the dataset and prints the first 5 rows. This shows the correct handling of the dates, baselined from 1900.
Month 1901-01-01 266.0 1901-02-01 145.9 1901-03-01 183.1 1901-04-01 119.3 1901-05-01 180.3 Name: Sales of shampoo over a three year period, dtype: float64We also get a plot of the dataset, showing the rising trend in sales from month to month.

Plot of the Shampoo Sales Dataset
Downsample Shampoo SalesThe observations in the Shampoo Sales are monthly.
Imagine we wanted daily sales information. We would have to downsample the frequency from monthly to daily and use an interpolation scheme to fill in the new daily frequency.
The Pandas library provides a function called resample() on the Series and DataFrame objects. This can be used to group records when upsampling and making space for new observations when downsampling.
We can use this function to transform our monthly dataset into a daily dataset by calling resampling and specifying the preferred frequency of calendar day frequency or “D”.
Pandas is clever and you could just as easily specify the frequency as “1D” or even something domain specific, such as “5D.” See the further reading section at the end of the tutorial for the list of aliases that you can use.
frompandasimportread_csv frompandasimportdatetime defparser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser) downsampled = series.resample('D').mean() print(downsampled.head(32))Running this example prints the first 32 rows of the downsampled dataset, showing each day of January and the first day of February.
Month 1901-01-01 266.0 1901-01-02 NaN 1901-01-03 NaN 1901-01-04 NaN 1901-01-05 NaN 1901-01-06 NaN 1901-01-07 NaN 1901-01-08 NaN 1901-01-09 NaN 1901-01-10 NaN 1901-01-11 NaN 1901-01-12 NaN 1901-01-13 NaN 1901-01-14 NaN 1901-01-15 NaN 1901-01-16 NaN 1901-01-17 NaN 1901-01-18 NaN 1901-01-19 NaN 1901-01-20 NaN 1901-01-21 NaN 1901-01-22 NaN 1901-01-23 NaN 1901-01-24 NaN 1901-01-25 NaN 1901-01-26 NaN 1901-01-27 NaN 1901-01-28 NaN 1901-01-29 NaN 1901-01-30 NaN 1901-01-31 NaN 1901-02-01 145.9We can see that the resample() function has created the rows by putting NaN values in the new values. We can see we still have the sales volume on the first of January and February from the original data.
Next, we can interpolate the missing values at this new frequency.
The Series Pandas object provides an interpolate() function to interpolate missing values, and there is a nice selection of simple and more complex interpolation functions. You may have domain knowledge to help choose how values are to be interpolated.
A good starting point is to use a linear interpolation. This draws a straight line between available data, in this case on the first of the month, and fills in values at the chosen frequency from this line.
frompandasimportread_csv frompandasimportdatetime defparser(x): return datetime.strptime('190'+x, '%Y-%m') series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0],