How to Best Tune Multithreading Support for XGBoost in Python

The XGBoost library for gradient boosting uses is designed for efficientmulti-core parallel processing.

This allows it to efficiently use all of the CPU cores in your system when training.

In this post you will discover the parallel processing capabilities of the XGBoost in python.

After reading this post you will know:

How to confirm that XGBoost multi-threading support is working on your system. How to evaluate the effect of increasing the number of threads on XGBoost. How to get the most out of multithreaded XGBoost when using cross validation and grid search.

Let’s get started.

How to Best Tune Multithreading Support for XGBoost in Python

Photo by Nicholas A. Tonelli , some rights reserved.

The Algorithm that is Winning Competitions

...XGBoost for fast gradient boosting

XGBoost is the high performance implementation of gradient boosting that you can now access directly in Python.

Your PDF Download and Email Course.

FREE 7-Day Mini-Course on

XGBoostWithPython

Download Your FREE Mini-Course

Download your PDF containing all 7lessons.

Daily lesson via email with tips and tricks.

Problem Description: Otto Dataset

In this tutorial we will use the Otto Group Product Classification Challenge dataset.

This dataset is available from Kaggle (you will need to sign-up to Kaggle to be able to download this dataset). You can download the training dataset train.zip from the Data page and place the unzipped trian.csv file into your working directory.

This dataset describes the 93 obfuscated details of more than 61,000 products grouped into 10 product categories (e.g. fashion, electrons, etc.). Input attributes are counts of different events of some kind.

The goal is to make predictions for new products as an array of probabilities for each of the 10 categories and modelsare evaluated using multiclass logarithmic loss (also called cross entropy).

This competition completed in May 2015 and this dataset is a good challenge for XGBoost because of the nontrivial number of examples and the difficulty of the problem and the fact that little data preparation is required (other than encoding the string class variables as integers).

Impact of the Number of Threads

XGBoost is implemented in C++ to explicitly make use of the OpenMP API for parallel processing.

The parallelism in gradient boostingcan be implemented in the construction of individual trees, rather than in creating trees in parallel like random forest. This is because in boosting, trees are added to the model sequentially. The speed of XGBoost is both in adding parallelism in the construction of individual trees, and in the efficient preparation of the input data to aid in the speed up in the construction of trees.

Depending on your platform, you may need to compile XGBoost specifically to support multithreading. See the XGBoost installation instructions for more details.

The XGBClassifier and XGBRegressor wrapper classes for XGBoost for use in scikit-learn provide the nthread parameter to specify the number of threads that XGBoost can use during training.

By default this parameter is set to -1 to make use of all of the cores in your system.

model = XGBClassifier(nthread=-1)

Generally, you should get multithreading support for your XGBoost installation without any extra work.

Depending on your Python environment (e.g. Python 3) you may need to explicitly enable multithreading support for XGBoost. The XGBoost library provides an example if you need help.

You can confirm that XGBoost multi-threading support is working by building a number of different XGBoost models, specifying the number of threads and timing how longit takes to build each model. The trend will both show you that multi-threading support is enabled and give you an indication of the effect it has when building models.

For example, if your system has 4 cores, you can train 8 different models and time how long in seconds it takes to create each, then compare the times.

# evaluate the effect of the number of threads results = [] num_threads = [1, 2, 3, 4] for n in num_threads: start = time.time() model = XGBClassifier(nthread=n) model.fit(X_train, y_train) elapsed = time.time() - start print(n, elapsed) results.append(elapsed)

We can use this approach on the Otto dataset. The full example is provided below for completeness.

You can change the num_threads array to meet the number of cores on your system.

# Otto, tune number of threads frompandasimportread_csv fromxgboostimportXGBClassifier fromsklearn.preprocessingimportLabelEncoder importtime frommatplotlibimportpyplot # load data data = read_csv('train.csv') dataset = data.values # split data into X and y X = dataset[:,0:94] y = dataset[:,94] # encode string class values as integers label_encoded_y = LabelEncoder().fit_transform(y) # evaluate the effect of the number of threads results = [] num_threads = [1, 2, 3, 4] for n in num_threads: start = time.time() model = XGBClassifier(nthread=n) model.fit(X, label_encoded_y) elapsed = time.time() - start print(n, elapsed) results.append(elapsed) # plot results pyplot.plot(num_threads, results) pyplot.ylabel('Speed (seconds)') pyplot.xlabel('Number of Threads') pyplot.title('XGBoost Training Speed vs Number of Threads') pyplot.show()

Running this example summarizes the execution time in seconds for each configuration, for example:

(1, 115.51652717590332) (2, 62.7727689743042) (3, 46.042901039123535) (4, 40.55334496498108)

A plot of these timings is provided below.

XGBoost Tune Number of Threads for Single Model

We can see a nice trend in the decrease in execution time as the number of threads is increased.

If you do not see an improvement in running time for each new thread, you may want to investigate how to enable multithreading support in XGBoost as part of your install or at runtime.

We can run the same code on a machine with a lot more cores. The large Amazon Web Services EC2 instanceis reported to have 32 cores. We can adapt the above code totime how long it takesto train the model with 1 to 32cores. The results are plotted below.

XGBoost Time to Train Model on 1 to 32 Cores

It is interesting to note that we do not see much improvement beyond 16 threads(at about 7 secon

How to Best Tune Multithreading Support for XGBoost in Python

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本