Introduction to Trainspotting

Here at Silicon Valley Data Science, we have a slight obsession with the Caltrain. Our interest stems from the fact that half of our employees rely on the Caltrain to get to work each day. We also want to give back to the community, and we love when we can do that with data. In addition to helpingclients build robust data systems or use data to solve business challenges , we like to work on R&D projects to explore technologies and experiment with new algorithms, hypotheses, and ideas. We previouslyanalyzed delays using Caltrain’s real-time API to improve arrival predictions, and we havemodeled the sounds of passing trains to tell them apart. In this post we’ll startlooking at the nuts and bolts of making our Caltrain work possible.

http://www.svds.com/wp-content/uploads/2016/09/side-by-side.mp4

If you have ever ridden the train, you know that the delay estimates Caltrain provides can be a bit…off. Sometimes a train will remain “two minutes delayed” for ten minutes after the train was already supposed to have departed, or delays will be reported when the train is on time. The idea for Trainspotting came from our desire to integrate new data sources for delay prediction beyond scraping Caltrain’s API . Since we had previously set up a Raspberry Pi to analyze train whistles, we thought it would be fun to validate the data coming from the Caltrain API by capturing real-time video and audio of trains passing by our office near the Mountain View station.

There were several questions we wanted our IoT Raspberry Pi train detector to answer:

Is there a train passing? Which direction is it going? How fast is the train moving?

Sound alone is pretty good at answering the first question because trains are rather loud. To help answer the rest of the questions, we added a camera to our Raspberry Pi to capture video.

We’ll describe this process in a series of posts. They will focus on:

Introduction to Trainspotting (you are here) Image Processing in python Streaming Video Analysis with Python Streaming Audio Analysis and Sensor Fusion Recognizing Images on a Raspberry Pi Connecting an IoT device to the Cloud Building a Deployable IoT Device

Let’s quickly look at what thesepieces will cover.

Walking through Trainspotting

In the upcoming “Image Processing in Python” post, Data Scientist Chloe Mawer demonstrates how to use open-source Python libraries to process images and videos for detecting trains and their direction using OpenCV. You can also see her recent talk from PyCon 2016 .

In “Streaming Video Analysis with Python,” Data Scientist Colin Higgins and Data Engineer Matt Rubashkin describe the steps to take the video analysis to the next level: implementing streaming, on-Pi video analysis with multithreading, and light/dark adaptation. The figure below gives a peek into some of the challenges in detecting trains in varied light conditions.

Challenges in detecting trains in varied light conditions

In a previous post mentioned above, Listening to Caltrain , we analyzed frequency profiles to discriminate between local and express trains passing our Sunnyvale office. Since that post, SVDS has grown and movedto Mountain View. Since the move, we found that the pattern of train sounds was different in the new location, so we needed a more flexible approach. In “Streaming Audio Analysis and Sensor Fusion,” Colin describes the audio processing and a custom sensor fusion architecture that controls both video and audio.

Custom sensor fusion architecture

After we were able to detect trains, their speed and their direction, we ran into a new problem: our Pi was not only detecting Caltrains (true positive), but also detecting Union Pacific freight trains and the VTA light rail (false positive). In order to boost our detector’s false positive rate, weused convolutional neural networks implemented in Google’s machine learning TensorFlow library . We implemented a custom Inception-V3 model trained on thousands of images of vehicles to identify different types of trains with >95% accuracy. Mattdetails this solution in “Recognizing Images on a Raspberry Pi.”

In “Connecting an IoT Device to the Cloud,” Matt shows how we connected our Pi to the cloud using Kafka, allowing monitoring with Grafana and persistence in HBase.

Monitoring our Pi with Grafana

The tools and next steps

Before we even finished the development on our first device, we wanted to set up more of these devices to get ground truth at other points along the track. With this in mind, we realized that we couldn’t always guarantee that we’d have a speedy internet connection, and we wanted to keep the devices themselves affordable. These requirements makethe the Raspberry Pi a great choice. The Pi has enough horsepower to do on-device stream processing so that we could send smaller, processed data streams over internet connections, and the parts are cheap. The total cost of our hardware for this sensor is $130, and the code relies only on open source libraries. In “Building a Deployable IoT Device,” we’ll walk through the device hardware and setup in detail and show you where you can get the code so you can start Trainspotting for yourself.

Device and hardware setup supplies

If you want to learn more about Trainspotting and Data Science at SVDS, stay tuned for our future Trainspotting blog posts, and you can sign up for our newsletterhere.Let us know which pieces of this series you’re most interested in.

You can also find our “Caltrain Rider” in the Android and Apple app stores. Our app is built upon the Hadoop Ecosystem including HBase and Spark, and relies on Kafka and Spark Streaming for ingestion and processing of Twitter sentiment and Caltrain API data.

Introduction to Trainspotting

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本