How to Tune the Number and Size of Decision Trees with XGBoost in Python

Gradient boosting involves the creation and addition of decision trees sequentially, each attempting to correct the mistakes of the learners that came before it.

This raises the question as to how many trees (weak learners or estimators) to configure in your gradient boosting model and how big eachtree should be.

In this post you will discover how to design a systematic experiment to select the number and size of decision trees to use on your problem.

After reading this post you will know:

How to evaluate the effect of adding more decision trees to your XGBoost model. How to evaluate the effect of creating larger decision trees to your XGBoost model. How to investigate the relationship between the number and depth of trees on your problem.

Let’s get started.

How to Tune the Number and Size of Decision Trees with XGBoost in python

Photo by USFWSmidwest , some rights reserved.

The Algorithm that is Winning Competitions

...XGBoost for fast gradient boosting

XGBoost is the high performance implementation of gradient boosting that you can now access directly in Python.

Your PDF Download and Email Course.

FREE 7-Day Mini-Course on

XGBoostWithPython

Download Your FREE Mini-Course

Download your PDF containing all 7lessons.

Daily lesson via email with tips and tricks.

Problem Description: Otto Dataset

In this tutorial we will use the Otto Group Product Classification Challenge dataset.

This dataset is available fro free from Kaggle (you will need to sign-up to Kaggle to be able to download this dataset). You can download the training dataset train.csv.zip from the Data page and place the unzipped train.csv file into your working directory.

This dataset describes the 93 obfuscated details of more than 61,000 products grouped into 10 product categories (e.g. fashion, electronics, etc.). Input attributes are counts of different events of some kind.

The goal is to make predictions for new products as an array of probabilities for each of the 10 categories and modelsare evaluated using multiclass logarithmic loss (also called cross entropy).

This competition was completed in May 2015 and this dataset is a good challenge for XGBoost because of the nontrivial number of examples, the difficulty of the problem and the fact that little data preparation is required (other than encoding the string class variables as integers).

Tune the Number of Decision Trees in XGBoost

Most implementations of gradient boosting are configured by default with a relatively small number of trees, such as hundreds or thousands.

The general reason is that on most problems, adding more trees beyond a limit does not improve the performance of the model.

The reason is in the way that the boosted tree model is constructed, sequentially where each new tree attempts to model and correct for the errors made by the sequence of previous trees. Quickly, the model reaches a point of diminishing returns.

We can demonstrate this point of diminishing returns easily on the Otto dataset.

The number of trees (or rounds)in an XGBoost model is specified to the XGBClassifier or XGBRegressor class in the n_estimators argument. The default in the XGBoost library is 100.

Using scikit-learn we can perform agrid search of the n_estimators model parameter, evaluating a series of values from 50 to 350with a step size of 50 (50, 150, 200, 250, 300, 350).

# grid search model = XGBClassifier() n_estimators = range(50, 400, 50) param_grid = dict(n_estimators=n_estimators) kfold = StratifiedKFold(label_encoded_y, n_folds=10, shuffle=True, random_state=7) grid_search = GridSearchCV(model, param_grid, scoring="log_loss", n_jobs=-1, cv=kfold) result = grid_search.fit(X, label_encoded_y)

We can perform this grid search on the Otto dataset, using 10-fold cross validation, requiring 60 models to be trained (6 configurations * 10 folds).

The full code listing is provided below for completeness.

# XGBoost on Otto dataset, Tune n_estimators frompandasimportread_csv fromxgboostimportXGBClassifier fromsklearn.grid_searchimportGridSearchCV fromsklearn.cross_validationimportStratifiedKFold fromsklearn.preprocessingimportLabelEncoder importmatplotlib matplotlib.use('Agg') frommatplotlibimportpyplot # load data data = read_csv('train.csv') dataset = data.values # split data into X and y X = dataset[:,0:94] y = dataset[:,94] # encode string class values as integers label_encoded_y = LabelEncoder().fit_transform(y) # grid search model = XGBClassifier() n_estimators = range(50, 400, 50) param_grid = dict(n_estimators=n_estimators) kfold = StratifiedKFold(label_encoded_y, n_folds=10, shuffle=True, random_state=7) grid_search = GridSearchCV(model, param_grid, scoring="log_loss", n_jobs=-1, cv=kfold) result = grid_search.fit(X, label_encoded_y) # summarize results print("Best: %f using %s" % (result.best_score_, result.best_params_)) means, stdevs = [], [] for params, mean_score, scoresin result.grid_scores_: stdev = scores.std() means.append(mean_score) stdevs.append(stdev) print("%f (%f) with: %r" % (mean_score, stdev, params)) # plot pyplot.errorbar(n_estimators, means, yerr=stdevs) pyplot.title("XGBoost n_estimators vs Log Loss") pyplot.xlabel('n_estimators') pyplot.ylabel('Log Loss') pyplot.savefig('n_estimators.png')

Running this example prints the following results.

Best: -0.001152 using {'n_estimators': 250} -0.010970 (0.001083) with: {'n_estimators': 50} -0.001239 (0.001730) with: {'n_estimators': 100} -0.001163 (0.001715) with: {'n_estimators': 150} -0.001153 (0.001702) with: {'n_estimators': 200} -0.001152 (0.001702) with: {'n_estimators': 250} -0.001152 (0.001704) with: {'n_estimators': 300} -0.001153 (0.001706) with: {'n_estimators': 350}

We can see that the cross validation log loss scores are negative. This is because the scikit-learn cross validation framework inverted them. The reason is that internally, the framework requires that all metrics that are being optimized are to be maximized, whereas log loss is a minimization metric.It can easily be made maximizing by inverting the scores.

The best number of trees was n_estimators=250 resulting in a log loss of 0.001152, but really not a significant difference from n_estimators=200 . In fact, there is not a large relativedifference in the number of trees between 100 and 350 if we plot the results.

Below isline graph showing the relationship between the number of trees and mean (inverted) logarithmic loss, with the standard deviation shown as error bars.

Tune The Number of Trees in XGBoost

Tune the Size of Decision Trees in XGBoost

In gradient boosting, we can control the size of decision trees, also called the number of layers or the depth.

Shallow trees are expected to have poor per

How to Tune the Number and Size of Decision Trees with XGBoost in Python

Trending Articles

[奇怪机翻组] 双梦相牵 / ふたりの夢もち [RJ01259078] [WebRip] [1080P HEVC-10Bit AAC 2.0]...

HONDA CITY VTI-S 菜單分享

#新闻拍一拍# 新的摩尔定律：黄氏定律

一如既往的痴情能否打动月瓶金蝎？ (豆瓣月亮水瓶小组)

求購按摩椅~'~

「粉红」不是霸凌辜莞允杠部落客：我爽在哪？

Intel 7-10代集成显卡驱动31.0.101.2137完整版

涉Gotbit加密货币市场操纵台男纽约被捕

臺灣法治會計學會2025年第三季研討會

不靠姊姊！張柏芝弟弟開計程車維生

关门一家亲：习远平、张澜澜、徐才厚

剑指offer——24.二叉树中和为某一值的路径

苏珊米勒日晕05.11｜狮子鼓励孩子；处女相信自己 (豆瓣 SUSAN MILLER小组)

【台積電IT卓越新戰略5】台積IT組織5年三次大調整，要靠平臺工程讓DevOps創新再加速

【日语无字】春之钟.Haru.no.kane.1985.JAP.vhsrip.NoSub.by.xiongzaixia&vivi

美籍老公不讓步李愛綺兒子念公立小學

新华网这张照片绝了!直讽江泽民宋祖英淫乱组图

湖州师范学院音乐学院开发的 Kontakt 8 明代魏氏乐琵琶/瑟/月琴音源即将发布

Google Chrome Portable 140.0.7339.186 穩定版免安裝中文版 - Google 瀏覽器

免费翻墙节点大全