Time Series Forecast Case Study with Python: Monthly Armed Robberies in Boston

Time series forecasting is a process, and the only way to get good forecasts is to practice this process.

In this tutorial, you will discover how to forecast the number of monthly armed robberies in Boston with python.

Working through this tutorial will provide you with a framework for the steps and the tools for working through your own time series forecasting problems.

After completing this tutorial, you will know:

How to check your Python environment and carefully define a time series forecasting problem. How to create a test harness for evaluating models, develop a baseline forecast, and better understand your problem with the tools of time series analysis. How to develop an autoregressive integrated moving average model, save it to file, and later load it to make predictions for new time steps.

Let’s get started.

Time Series Forecast Case Study with Python Monthly Armed Robberies in Boston

Photo by Tim Sackton , some rights reserved.

Overview

In this tutorial, we will work through a time series forecasting project from end-to-end, from downloading the dataset and defining the problem to training a final model and making predictions.

This project is not exhaustive, but shows how you can get good results quickly by working through a time series forecasting problem systematically.

The steps of this project that we will work through are as follows:

Environment. Problem Description. Test Harness. Persistence. Data Analysis. ARIMA Models Model Validation

This will provide a template for working through a time series prediction problem that you can use on your own dataset.

1. Environment

This tutorial assumes an installed and working SciPy environment and dependencies, including:

SciPy NumPy Matplotlib Pandas scikit-learn statsmodels

I used Python 2.7. Are you on Python 3? let me know how you go in thecomments.

This script will help you check your installed versions of these libraries.

# scipy importscipy print('scipy: {}'.format(scipy.__version__)) # numpy importnumpy print('numpy: {}'.format(numpy.__version__)) # matplotlib importmatplotlib print('matplotlib: {}'.format(matplotlib.__version__)) # pandas importpandas print('pandas: {}'.format(pandas.__version__)) # scikit-learn importsklearn print('sklearn: {}'.format(sklearn.__version__)) # statsmodels importstatsmodels print('statsmodels: {}'.format(statsmodels.__version__))

The results on my workstation used to write this tutorial are as follows:

scipy: 0.18.1 numpy: 1.11.2 matplotlib: 1.5.3 pandas: 0.19.1 sklearn: 0.18.1 statsmodels: 0.6.1 2. Problem Description

The problem is to predict the number of monthly armed robberies in Boston, USA.

The dataset provides the number of monthly armed robberies in Boston from January 1966 to October 1975, or just under 10 years of data.

The values are a count and there are 118 observations.

The dataset is credited to McCleary & Hay (1980).

You can learn more about this dataset and download it directly from DataMarket .

Download the dataset as a CSV file and place it in your current working directory with the filename “ robberies.csv “.

3. Test Harness

We must develop a test harness to investigate the data and evaluate candidate models.

This involves two steps:

Defining a Validation Dataset. Developing a Method for Model Evaluation. 3.1 Validation Dataset

The dataset is not current. This means that we cannot easily collect updated data to validate the model.

Therefore we will pretend that it is October 1974 and withhold the last one year of data from analysis and model selection.

This final year of data will be used to validate the final model.

The code below will load the dataset as a Pandas Series and split into two, one for model development ( dataset.csv ) and the other for validation ( validation.csv ).

frompandasimportSeries series = Series.from_csv('robberies.csv', header=0) split_point = len(series) - 12 dataset, validation = series[0:split_point], series[split_point:] print('Dataset %d, Validation %d' % (len(dataset), len(validation))) dataset.to_csv('dataset.csv') validation.to_csv('validation.csv')

Running the example creates two files and prints the number of observations in each.

Dataset 106, Validation 12

The specific contents of these files are:

dataset.csv : Observations from January 1966 to October 1974 (106 observations) validation.csv : Observations from November 1974 to October 1975 (12 observations)

The validation dataset is 10% of the original dataset.

Note that the saved datasets do not have a header line, therefore we do not need to cater to this when working with these files later.

3.2. Model Evaluation

Model evaluation will only be performed on the data in dataset.csv prepared in the previous section.

Model evaluation involves two elements:

Performance Measure. Test Strategy. 3.2.1 Performance Measure

The observations are a count of robberies.

We will evaluate the performance of predictions using the root mean squared error (RMSE). This will give more weight to predictions that are grossly wrong and will have the same units as the original data.

Any transforms to the data must be reversed before the RMSE is calculated and reported to make the performance between different methods directly comparable.

We can calculate the RMSE using the helper function from the scikit-learn library mean_squared_error() that calculates the mean squared error between a list of expected values (the test set) and the list of predictions. We can then take the square root of this value to give us an RMSE score.

For example:

fromsklearn.metricsimportmean_squared_error frommathimportsqrt ... test = ... predictions = ... mse = mean_squared_error(test, predictions) rmse = sqrt(mse) print('RMSE: %.3f' % rmse) 3.2.2 Test Strategy

Candidate models will be evaluated using walk-forward validation.

This is because a rolling-forecast type model is required from the problem definition. This is where one-step forecasts are needed given all available data.

The walk-forward validation will work as follows:

The first 50% of the dataset will be held back to train the model. The remaining 50% of the dataset will be iterated and test the model. For each step in the test dataset: A model will be trained. A one-step prediction made and the prediction stored for later evaluation. The actual observation from the test dataset will be added to the training dataset for the next iteration. The predictions made during the iteration of the test dataset will be evaluated and an RMSE score reported.

Given the small size of the data, we will allow a model to be re-trained given all available data prior to each prediction.

We can write the code for the test h

Time Series Forecast Case Study with Python: Monthly Armed Robberies in Boston

Trending Articles

[黑白字幕组]强者的新传说 / Tsuyokute New Saga [02] [webRip] [AVC-8bit 1080P AAC] [繁日内嵌]

关门一家亲：习远平、张澜澜、徐才厚

TeraCopy 3.26 免安裝中文版 (3.8.5 英文版) - 加速檔案複製

出售: Nordost QRT QB8 MK1，英式

三胞乐语荣膺CCFA最高荣誉“2016中国零售创新大奖”

免费翻墙节点大全

Windows 10 22H2 官方正式版2025年1月版消费者版

Android WebView启动Chromium渲染引擎的过程分析

抑郁症“救”了我 (豆瓣 Anti-Parents 父母皆祸害小组)

[新聞] 國軍輕生又一樁！空軍儀隊士兵飲銅油自

出售: AIWA XC-001 CD PLAYER

求助：处女男的洁癖和肢体接触！点进来的全都感情美满满的都是爱！！ (豆瓣【处女座】小组)

[Tpimage] 2013.11.10 No.521 lily [58P/254.70M] [baidu/360/115]

什麼是SMT葡萄球珠現象(Graping)？該如何解決？

RADStudio v12.2.29.0.53982.0329 KeyPatch [含附件]

增加太陽能發電北市6區機關學校「追日」

Fabia 方向盤轉動異音

太搞笑！西方人念泰语绕口令

【出售烟】中南海&万宝路 (豆瓣英国留学生小组)

动画「冰果」中富农千反田爱瑠家的3种大米开卖啦！