How to Implementing Stacking From Scratch With Python

Ensemble methods are an excellent way to improve predictive performance on your machine learning problems.

Stacked Generalization or stacking is an ensemble technique that uses a new model to learn how to best combine the predictions from two or more models trained on your dataset.

In this tutorial, you will discover how to implement stacking from scratch in python.

After completing this tutorial, you will know:

How to learn to combine the predictions from multiple models on a dataset. How to apply stacked generalization to a real-world predictive modeling problem.

Let’s get started.

How to Implementing Stacking From Scratch With Python

Photo by Kiran Foster , some rights reserved.

Description

This section provides a brief overview of the Stacked Generalization algorithm and the Sonar dataset used in this tutorial.

Stacked Generalization Algorithm

Stacked Generalization or stacking is an ensemble algorithm where a new model is trained to combine the predictions from two or more models already trained or your dataset.

The predictions from the existing models or submodels are combined using a new model, and as such stacking is often referred to as blending, as the predictions from sub-models are blended together.

It is typical to use a simple linear method to combine the predictions for submodels such as simple averaging or voting, to a weighted sum using linear regression or logistic regression.

Models that have their predictions combined must have skill on the problem, but do not need to be the best possible models. This means that you do not need to tune the submodels intently, as long as the model shows some advantage over a baseline prediction.

It is important that sub-models produce different predictions, so-called uncorrelated predictions. Stacking works best when the predictions that are combined are all skillful, but skillful in different ways. This may be achieved by using algorithms that use very different internal representations (trees compared to instances) and/or models trained on different representations or projections of the training data.

In this tutorial, we will look at taking two very different and untuned sub-models and combining their predictions with a simple logistic regression algorithm.

Sonar Dataset

The dataset we will use in this tutorial is the Sonar dataset.

This is a dataset that describes sonar chirp returns bouncing off different surfaces. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders. There are 208 observations.

It is a well-understood dataset. All of the variables are continuous and generally in the range of 0 to 1. The output variable is a string “M” for mine and “R” for rock, which will need to be converted to integers 1 and 0.

By predicting the class with the most observations in the dataset (M or mines) the Zero Rule Algorithm can achieve an accuracy of about 53%.

You can learn more about this dataset at the UCI Machine Learning repository .

Download the dataset for free and place it in your working directory with the filename sonar.all-data.csv .

Tutorial

This tutorial is broken down into 3 steps:

Sub-models and Aggregator. Combining Predictions. Sonar Dataset Case Study.

These steps provide the foundation that you need to understand and implement stacking on your own predictive modeling problems.

1. Sub-models and Aggregator

We are going to use two models as submodels for stacking and a linear model as the aggregator model.

This part is divided into 3 sections:

Sub-model #1: k-Nearest Neighbors. Sub-model #2: Perceptron. Aggregator Model: Logistic Regression.

Each model will be described in terms of the functions used train the model and a function used to make predictions.

1.1 Sub-model #1: k-Nearest Neighbors

The k-Nearest Neighbors algorithm or kNN uses the entire training dataset as the model.

Therefore training the model involves retaining the training dataset. Below is a function named knn_model() that does just this.

# Prepare the kNN model defknn_model(train): return train

Making predictions involves finding the k most similar records in the training dataset and selecting the most common class values. The Euclidean distance function is used to calculate the similarity between new rows of data and rows in the training dataset.

Below are these helper functions that involve making predictions for a kNN model. The function euclidean_distance() calculates the distance between two rows of data, get_neighbors() locates all neighbors for in the training dataset for a new row of data and knn_predict() makes a prediction from the neighbors for a new row of data.

# Calculate the Euclidean distance between two vectors defeuclidean_distance(row1, row2): distance = 0.0 for i in range(len(row1)-1): distance += (row1[i] - row2[i])**2 return sqrt(distance) # Locate neighbors for a new row defget_neighbors(train, test_row, num_neighbors): distances = list() for train_rowin train: dist = euclidean_distance(test_row, train_row) distances.append((train_row, dist)) distances.sort(key=lambdatup: tup[1]) neighbors = list() for i in range(num_neighbors): neighbors.append(distances[i][0]) return neighbors # Make a prediction with kNN defknn_predict(model, test_row, num_neighbors=2): neighbors = get_neighbors(model, test_row, num_neighbors) output_values = [row[-1] for rowin neighbors] prediction = max(set(output_values), key=output_values.count) return prediction

You can see that the number of neighbors (k) is set to 2 as a default parameter on the knn_predict() function. This number was chosen with a little trial and error and was not tuned.

Now that we have the building blocks for a kNN model, let’s look at the Perceptron algorithm.

1.2 Sub-model #2: Perceptron

The model for the Perceptron algorithm is a set of weights learned from the training data.

In order to train the weights, many predictions need to be made on the training data in order to calculate error values. Therefore, both model training and prediction require a function for prediction.

Below are the helper functions for implementing the Perceptron algorithm. The perceptron_model() function trains the Perceptron model on the training dataset and perceptron_predict() is used to make a prediction for a row of data.

# Make a prediction with weights defperceptron_predict(model, row): activation = model[0] for i in range(len(row)-1): activation += model[i + 1] * row[i] return 1.0 if activation >= 0.0 else 0.0 # Estimate Perceptron weights using stochastic gradient descent defperceptron_model(train, l_rate=0.01, n_epoch=5000): weights = [0.0 for i in range(len(train[0]))] for epochin range(n_epoch): for rowin train: prediction = perceptron_predict(weights, row) error = row[-1] - prediction weights[0] = weights[0] + l_rate * error for i in range(len(row)-1): weights[i + 1] = weights[i + 1] + l_rate * error * row[i] return weights The perceptron_model() model speci

How to Implementing Stacking From Scratch With Python

Trending Articles

SM3268AB 8CE三星量产无法格式化

[下载工具]Think4V utubedown(Youtube高清视频下载工具) v2.1.6 官方版2.1.3

出售: SINE Othello 電源線

博讯｜张磊帮助下，李源潮的儿子被耶鲁录取

FullEventLogView 1.73 免安裝中文版 - 事件檢視器取代工具

同門四角戀？李沛旭喇舌「小郭雪芙」曾智希，蔡淑臻拍完婚紗...怒毀婚

五代RAV4 降車身（機械車位因素）

[攻略] 《魔獸世界》6.2.2 白色魚人蛋再現！來去收編魚人寶寶特基！

jetBrains Product crack 2024 Java based

2013 KUGA 6G轉動方向盤會聽到摳摳摳的異音，有人知道原因嗎?

【豌豆字幕組】[藥屋少女的呢喃（藥師少女的獨語）/ Kusuriya no Hitorigoto][25][繁體][1080P][MP4]

好用的照片后期处理软件【DxO PhotoLab Elite 5.4.0.4765 (x64) 多语言便携版】..

出售: Thixar Silence Plus 啫喱板

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

三條崙討海人故事…重建烏倉寮憶43年前船難

致喬立建設道歉聲明

[一般] 神州全地圖掉寶資料

方易通7862 8/128G 無360 刷機

動感校園小記者・瑪利諾修院學校｜採訪王瑋駿陳晞文帶領試玩風帆

有藍電流行車紀錄器分享文嗎