Overfitting is a problem with sophisticated non-linear learning algorithms like gradient boosting.
In this post you will discover how you can use early stopping to limit overfitting with XGBoost in python.
After reading this post, you will know:
About early stopping as an approach to reducing overfitting of training data. How to monitor the performance of an XGBoost model during training and plot the learning curve. How to use early stopping to prematurely stop the training of an XGBoost model at an optimal epoch.Let’s get started.

Avoid Overfitting By Early Stopping With XGBoost In Python
Photo by Michael Hamann , some rights reserved.
The Algorithm that is Winning Competitions
...XGBoost for fast gradient boosting

XGBoost is the high performance implementation of gradient boosting that you can now access directly in Python.
Your PDF Download and Email Course.
FREE 7-Day Mini-Course on
XGBoostWithPython
Download Your FREE Mini-CourseDownload your PDF containing all 7lessons.
Daily lesson via email with tips and tricks.
Early Stopping toAvoid OverfittingEarly stopping is an approach to training complex machine learning models to avoid overfitting.
It works by monitoring the performance of the model that is being trained on a separate test dataset and stopping the training procedure once the performance on the test dataset has not improved after a fixed number of training iterations.
It avoids overfitting by attempting to automatically select the inflection point where performance on the test dataset starts to decrease while performance on the training dataset continues to improve as the model starts to overfit.
The performance measure may be the loss function that is being optimized to train the model (such as logarithmic loss), or an external metric of interest to the problem in general (such as classification accuracy).
Monitoring Training Performance WithXGBoostThe XGBoost model can evaluate and report on the performance on a test set for the the model during training.
It supports this capability by specifying both an test dataset and an evaluation metric on the call to model.fit() when training the model and specifying verbose output.
For example, we can report on the binary classification error rate (“ error “) on a standalone test set ( eval_set ) while training an XGBoost model as follows:
eval_set = [(X_test, y_test)] model.fit(X_train, y_train, eval_metric="error", eval_set=eval_set, verbose=True)XGBoost supports a suite of evaluation metrics not limited to:
“ rmse ” for root mean squared error. “ mae ” for mean absolute error. “ logloss ” for binary logarithmic loss and “ mlogloss ” for multi-class log loss (cross entropy). “ error ” for classification error. “ auc ” for area under ROC curve.The full list is provided in the “ Learning Task Parameters ” section of the XGBoost Parameters webpage.
For example, we can demonstrate how to track the performance of the training of an XGBoost model on the Pima Indians onset of diabetes dataset , available from the UCI Machine Learning Repository.
The full example is provided below:
# monitor training performance fromnumpyimportloadtxt fromxgboostimportXGBClassifier fromsklearn.cross_validationimporttrain_test_split fromsklearn.metricsimportaccuracy_score # load data dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",") # split data into X and y X = dataset[:,0:8] Y = dataset[:,8] # split data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=7) # fit model no training data model = XGBClassifier() eval_set = [(X_test, y_test)] model.fit(X_train, y_train, eval_metric="error", eval_set=eval_set, verbose=True) # make predictions for test data y_pred = model.predict(X_test) predictions = [round(value) for valuein y_pred] # evaluate predictions accuracy = accuracy_score(y_test, predictions) print("Accuracy: %.2f%%" % (accuracy * 100.0))Running this example trains the model on 67% of the data and evaluates the model every training epoch on a 33% test dataset.
The classification error is reported each iteration and finally the classification accuracy is reported at the end.
The output is provided below, truncated for brevity. We can see that the classification error is reported each training iteration (after each boosted tree is added to the model).
... [89] validation_0-error:0.204724 [90] validation_0-error:0.208661 [91] validation_0-error:0.208661 [92] validation_0-error:0.208661 [93] validation_0-error:0.208661 [94] validation_0-error:0.208661 [95] validation_0-error:0.212598 [96] validation_0-error:0.204724 [97] validation_0-error:0.212598 [98] validation_0-error:0.216535 [99] validation_0-error:0.220472 Accuracy: 77.95%Reviewing all of the output, we can see that the model performance on the test set sits flat and even gets worse towards the end of training.
Evaluate XGBoost Models With Learning CurvesWe can retrieve the performance of the model on the evaluation dataset and plot itto get insight into how learning unfolded while training.
We provide an array of X and y pairs to the eval_metric argument when fitting our XGBoost model. In addition to a test set, we can also provide the training dataset. This will provide a report on how well the model is performing on both training and test sets during training.
For example:
eval_set = [(X_train, y_train), (X_test, y_test)] model.fit(X_train, y_train, eval_metric="error", eval_set=eval_set, verbose=True)In addition, the performance of the model on each evaluation set is stored and made available by the model after training by calling the model.evals_result() function. This returns a dictionary of evaluation datasets and scores, for example:
results = model.evals_result() print(results)This will print results like the following (truncated for brevity):
{ 'validation_0': {'error': [0.259843, 0.26378, 0.26378, ...]}, 'validation_1': {'error': [0.22179, 0.202335, 0.196498, ...]} }Each of ‘ validation_0 ‘ and ‘ validation_1 ‘ correspond to the order that datasets were provided to the eval_set argument in the call to fit() .
A specific array of results, such as for the first dataset and the error metric can be accessed as follows:
results['validation_0']['error'] Additionally, we can specify more evaluation metrics to evaluate and collect by providing an array of metrics to the eval_metric argument of the f