Analyzing Pronto CycleShare Data with Python and Pandas

This week Pronto CycleShare , Seattle's Bicycle Share system, turned one year old. To celebrate this, Pronto made available a large cache of data from the first year of operation and announced the Pronto Cycle Share's Data Challenge , which offers prizes for different categories of analysis.

There are a lot of tools out there that you could use to analyze data like this, but my tool of choice is (obviously) python. In this post, I want to show how you can get started analyzing this data and joining it with other available data sources using the PyData stack, namely NumPy , Pandas , Matplotlib , and Seaborn . Here I'll take a look at some of the basic questions you can answer with this data. Later I hope to find the time to dig deeper and ask some more interesting and creative questions stay tuned!

For those who aren't familiar, this post is composed in the form of a Jupyter Notebook , which is an open document format that combines text, code, data, and graphics and is viewable through the web browser if you have not used it before I encourage you to try it out! You can download the notebook containing this posthere, open it with Jupyter, and start asking your own questions of the data.

Downloading Pronto's Data

We'll start by downloading the data (available on Pronto's Website ) which you can do by uncommenting the following shell commands (the exclamation mark here is a special IPython syntax to run a shell command). The total download is about 70MB, and the unzipped files are around 900MB.

In[1]:

# !curl -O https://s3.amazonaws.com/pronto-data/open_data_year_one.zip # !unzip open_data_year_one.zip

Next we need some standard Python package imports:

In[2]:

%matplotlib inline import matplotlib.pyplot as plt import pandas as pd import numpy as np import seaborn as sns; sns.set()

And now we load the trip data with Pandas:

In[3]: trips = pd.read_csv('2015_trip_data.csv', parse_dates=['starttime', 'stoptime'], infer_datetime_format=True) trips.head() Out[3]: trip_id starttime stoptime bikeid tripduration from_station_name to_station_name from_station_id to_station_id usertype gender birthyear 0 431 2014-10-13 10:31:00 2014-10-13 10:48:00 SEA00298 985.935 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Annual Member Male 1960 1 432 2014-10-13 10:32:00 2014-10-13 10:48:00 SEA00195 926.375 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Annual Member Male 1970 2 433 2014-10-13 10:33:00 2014-10-13 10:48:00 SEA00486 883.831 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Annual Member Female 1988 3 434 2014-10-13 10:34:00 2014-10-13 10:48:00 SEA00333 865.937 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Annual Member Female 1977 4 435 2014-10-13 10:34:00 2014-10-13 10:49:00 SEA00202 923.923 2nd Ave & Spring St Occidental Park / Occidental Ave S & S Washing... CBD-06 PS-04 Annual Member Male 1971

Each row of this trip dataset is a single ride by a single person, and the data contains over 140,000 rows!

Exploring Trips over Time

Let's start by looking at the trend in number of daily trips over the course of the year

In[4]: # Find the start date ind = pd.DatetimeIndex(trips.starttime) trips['date'] = ind.date.astype('datetime64') trips['hour'] = ind.hour In[5]:

# Count trips by date by_date = trips.pivot_table('trip_id', aggfunc='count', index='date', columns='usertype', )

In[6]: fig, ax = plt.subplots(2, figsize=(16, 8)) fig.subplots_adjust(hspace=0.4) by_date.il

Analyzing Pronto CycleShare Data with Python and Pandas

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本