Automating High-Frequency Trading Price Change Forecast using Machine Learning T ...

Project author Raja Sekhar Vinnakota

Project mentor Nitesh Khandelwal

This article outlines the work undertaken by the author as a part of his final project submitted inthe Executive Programme in Algorithmic Trading (EPAT) at QuantInsti. You can view the author’s entire project work by clicking on the downloadable button.

In the project, the author has demonstrated the use of various machine learning techniques for forecasting the mid-price movements using limit order book dynamics. A simple trading strategy was also tested and shown to achieve profitable returns against the sample data. For the sample data, the author has used the free two trading days of TAQ NYSE OpenBook data.

The following machine learning techniques/tools were tested on the sample data:

Simple Baseline models (stratified, most_frequent) python sklearn (DummyClassifer) RandomForest H2O, Python, R, Spark, XGBoost Support Vector Machines Python (sklearn) Logistic Regression Python (sklearn) LDA Python (sklearn) KNN Python (sklearn) CART Python (sklearn) NB Python (sklearn) Autosklearn TPOT LSTM Keras/Theano

This exhaustive project work was carried out on the followingAmazon EC2/GPU instances

ML Model/tool Amazon Instance Type vCPU Mem (GiB) Storage RandomForest/baseline/other classification models c4.2xlarge 8 15 EBS-only Auto-sklearn, TPOT r3.xlarge 4 30.5 SSD (GB) 1 x 80 Deep Learning g2.2xlarge 32 (4 GPUs) 15 SSD (GB) 1 x 60 Model framework The model framework has been shown below.
Automating High-Frequency Trading Price Change Forecast using Machine Learning T ...

scala-openbook (Eugene Z.) library was used for parsing the NYSE TAQ data. Orderbook-dynamics (Eugene Z.) was used for order book construction/feature extraction. Code base was changed to upgrade to Spark 1.6.1 and relevant Spark ML related changes were added. Training/Test dataset for sample data obtained after feature extraction in the earlier step was used for training/validation of Random Forest classification model using tools like H2O, R, XGBoost, scikit-learn and Spark. Various other classification models were also tested using Python’s scikit-learn. Tested LSTM RNN model using Keras. AutoML using auto-sklearn/TPOT. Methodology

Feature space was chosen as a subset of the feature vector set shown in the table below. Feature vectors are calculated based on a configured Time Window (Δ) from the LOB snapshots. The mid-price movement (average of best bid and best ask.) was used as class labels. An upward movement indicator (0) is assigned to a data point if the mid-price at label duration (4Δ) later is larger than the mid-price of the current data point. Similarly, a down label (1) and Stationary (2) are assigned accordingly. This is implemented using two cursors (attribute + label) as shown below.

Feature Vector Set 5 levels of the LOB

The author used the training/test set to measure the performance. To validate the model, performance was measured using below measures:

1. Precision: P = #(correctly labeled y)/ #(y in the predictions)

2. Recall: R = #(correctly labeled y)/ #(y in the sample)

3. F1 measure F1 = 2PR/(P + R)

4. Balanced Accuracy

The author used various machine learning techniques on the ORCL sample data. We are listing some of the important results and comparisons. To view the complete analysis, check the attached project report.

The RandomForest had the best accuracy measures (balanced dataset), given that the feature space was nonlinear.

XGBoost had the best accuracy measures for Random Forest (10 Trees) across the different tools tested using balanced dataset.

Precision, Recall, F1-Measure using sklearn (Random Forest, optimized params, balanced dataset) which had the best-balanced accuracy as shown below.

The author tested a simple strategy using ORCL data. The following table lists the rules and assumptions.

Download Report

Next Step:

Click on the downloadable button to view the entire 50 page project report. You can also check ourEPAT Project Work pageandhave a look at what our students are building. If you want to learn various aspects of Algorithmic Trading then check outthe Executive Programme in Algorithmic Trading (EPAT) . EPAT equips you with the required skill sets to be a successful algo trader.Enroll now!

Automating High-Frequency Trading Price Change Forecast using Machine Learning T ...

Trending Articles

[奇怪机翻组] 双梦相牵 / ふたりの夢もち [RJ01259078] [WebRip] [1080P HEVC-10Bit AAC 2.0]...

HONDA CITY VTI-S 菜單分享

#新闻拍一拍# 新的摩尔定律：黄氏定律

一如既往的痴情能否打动月瓶金蝎？ (豆瓣月亮水瓶小组)

求購按摩椅~'~

「粉红」不是霸凌辜莞允杠部落客：我爽在哪？

Intel 7-10代集成显卡驱动31.0.101.2137完整版

涉Gotbit加密货币市场操纵台男纽约被捕

臺灣法治會計學會2025年第三季研討會

不靠姊姊！張柏芝弟弟開計程車維生

关门一家亲：习远平、张澜澜、徐才厚

剑指offer——24.二叉树中和为某一值的路径

苏珊米勒日晕05.11｜狮子鼓励孩子；处女相信自己 (豆瓣 SUSAN MILLER小组)

【台積電IT卓越新戰略5】台積IT組織5年三次大調整，要靠平臺工程讓DevOps創新再加速

【日语无字】春之钟.Haru.no.kane.1985.JAP.vhsrip.NoSub.by.xiongzaixia&vivi

美籍老公不讓步李愛綺兒子念公立小學

新华网这张照片绝了!直讽江泽民宋祖英淫乱组图

湖州师范学院音乐学院开发的 Kontakt 8 明代魏氏乐琵琶/瑟/月琴音源即将发布

Google Chrome Portable 140.0.7339.186 穩定版免安裝中文版 - Google 瀏覽器

免费翻墙节点大全