How to Implement Linear Regression With Stochastic Gradient Descent From Scratch ...

The core of many machine learning algorithms is optimization.

Optimization algorithms are used by machine learning algorithms to find a good set of model parameters given a training dataset.

The most common optimization algorithm used in machine learning is stochastic gradient descent.

In this tutorial, you will discover how to implement stochastic gradient descent to optimize a linear regression algorithm from scratch with python.

After completing this tutorial, you will know:

How to estimate linear regression coefficients using stochastic gradient descent. How to make predictions for multivariate linear regression. How to implement linear regression with stochastic gradient descent to make predictions on new data.

Let’s get started.

How to Implement Linear Regression With Stochastic Gradient Descent From Scratch With Python

Photos by star5112 , some rights reserved.

Description

In this section, we will describe linear regression, the stochastic gradient descent technique and the wine quality dataset used in this tutorial.

Multivariate Linear Regression

Linear regression is a technique for predicting a real value.

Confusingly, these problems where a real value is to be predicted are called regression problems.

Linear regression is a technique where a straight line is used to model the relationship between input and output values. In more than two dimensions, this straight line may be thought of as a plane or hyperplane.

Predictions are made as a combination of the input values to predict the output value.

Each input attribute (x) is weighted using a coefficient (b), and the goal of the learning algorithm is to discover a set of coefficients that results in good predictions (y).

y = b0 + b1 * x1 + b2 * x2 + ...

Coefficients can be found using stochastic gradient descent.

Stochastic Gradient Descent

Gradient Descent is the process of minimizing a function by following the gradients of the cost function.

This involves knowing the form of the cost as well as the derivative so that from a given point you know the gradient and can move in that direction, e.g. downhill towards the minimum value.

In machine learning, we can use a technique that evaluates and updates the coefficients every iteration called stochastic gradient descent to minimize the error of a model on our training data.

The way this optimization algorithm works is that each training instance is shown to the model one at a time. The model makes a prediction for a training instance, the error is calculated and the model is updated in order to reduce the error for the next prediction. This process is repeated for a fixed number of iterations.

This procedure can be used to find the set of coefficients in a model that result in the smallest error for the model on the training data. Each iteration, the coefficients (b) in machine learning language are updated using the equation:

b = b - learning_rate * error * x

Where b is the coefficient or weight being optimized, learning_rate is a learning rate that you must configure (e.g. 0.01), error is the prediction error for the model on the training data attributed to the weight, and x is the input value.

Wine Quality Dataset

After we develop our linear regression algorithm with stochastic gradient descent, we will use it to model the wine quality dataset.

This dataset is comprised of the details of 4,898 white wines including measurements like acidity and pH. The goal is to usethese objective measures to predict the wine quality on a scale between 0 and 10.

Below is a sample of the first 5 records from this dataset.

7,0.27,0.36,20.7,0.045,45,170,1.001,3,0.45,8.8,6 6.3,0.3,0.34,1.6,0.049,14,132,0.994,3.3,0.49,9.5,6 8.1,0.28,0.4,6.9,0.05,30,97,0.9951,3.26,0.44,10.1,6 7.2,0.23,0.32,8.5,0.058,47,186,0.9956,3.19,0.4,9.9,6 7.2,0.23,0.32,8.5,0.058,47,186,0.9956,3.19,0.4,9.9,6

The dataset must be normalized to the values between 0 and 1 as each attribute hasdifferent units and in turn different scales.

By predicting the mean value (Zero Rule Algorithm) on the normalized dataset, a baseline root mean squared error (RMSE) of 0.148 can be achieved.

You can learn more about the dataset on the UCI Machine Learning Repository .

You can download the dataset and save it in your current working directory with the name winequality-white.csv . You must remove the header information and the empty lines from the end of the file, and convert the “;” value separator to “,” to meet CSV format.

Tutorial

This tutorial is broken down into 3parts:

Making Predictions. Estimating Coefficients. Wine Quality Prediction.

This will provide the foundation you need to implement and apply linear regression with stochastic gradient descent on your own predictive modeling problems.

1. Making Predictions

The first step is to develop a function that can make predictions.

This will be needed both in the evaluation of candidate coefficient values in stochastic gradient descent and after the model is finalized and we wish to start making predictions on test data or new data.

Below is a function named predict() that predicts an output value for a row given a set of coefficients.

The first coefficient in is always the intercept, also called the bias or b0 as it is standalone and not responsible for a specific input value.

# Make a prediction with coefficients defpredict(row, coefficients): yhat = coefficients[0] for i in range(len(row)-1): yhat += coefficients[i + 1] * row[i] return yhat

We can contrive a small dataset to test our prediction function.

x, y 1, 1 2, 3 4, 3 3, 2 5, 5

Below is a plot of this dataset.

Small Contrived Dataset For Linear Regression

We can also use previously prepared coefficients to make predictions for this dataset.

Putting this all together we can test our predict() function below.

# Make a prediction with coefficients defpredict(row, coefficients): yhat = coefficients[0] for i in range(len(row)-1): yhat += coefficients[i + 1] * row[i] return yhat dataset = [[1, 1], [2, 3], [4, 3], [3, 2], [5, 5]] coef = [0.4, 0.8] for rowin dataset: yhat = predict(row, coef) print("Expected=%.3f, Predicted=%.3f" % (row[-1], yhat))

There is a single input value (x) and two coefficient values (b0 and b1). The prediction equation we have modeled for this problem is:

y = b0 + b1 * x

or, with the specific coefficient values we chose by hand as:

y = 0.4 + 0.8 * x

Running this function we get predictions that are reasonably close to the expected output (y) values.

Expected=1.000, Predicted=1.200 Expected=3.000, Predicted=2.000 Expected=3.000, Predicted=3.600 Expected=2.000, Predicted=2.800 Expected=5.000, Predicted=4.400

Now we are ready to implement stochastic gradient descent to optimize our coefficient values.

2. Estimating Coefficients

We can estimate the coefficient values for our training data using stochastic gradient descent.

Stochastic

How to Implement Linear Regression With Stochastic Gradient Descent From Scratch ...

Trending Articles

SM3268AB 8CE三星量产无法格式化

[下载工具]Think4V utubedown(Youtube高清视频下载工具) v2.1.6 官方版2.1.3

出售: SINE Othello 電源線

博讯｜张磊帮助下，李源潮的儿子被耶鲁录取

FullEventLogView 1.73 免安裝中文版 - 事件檢視器取代工具

同門四角戀？李沛旭喇舌「小郭雪芙」曾智希，蔡淑臻拍完婚紗...怒毀婚

五代RAV4 降車身（機械車位因素）

[攻略] 《魔獸世界》6.2.2 白色魚人蛋再現！來去收編魚人寶寶特基！

jetBrains Product crack 2024 Java based

2013 KUGA 6G轉動方向盤會聽到摳摳摳的異音，有人知道原因嗎?

【豌豆字幕組】[藥屋少女的呢喃（藥師少女的獨語）/ Kusuriya no Hitorigoto][25][繁體][1080P][MP4]

好用的照片后期处理软件【DxO PhotoLab Elite 5.4.0.4765 (x64) 多语言便携版】..

出售: Thixar Silence Plus 啫喱板

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

三條崙討海人故事…重建烏倉寮憶43年前船難

致喬立建設道歉聲明

[一般] 神州全地圖掉寶資料

方易通7862 8/128G 無360 刷機

動感校園小記者・瑪利諾修院學校｜採訪王瑋駿陳晞文帶領試玩風帆

有藍電流行車紀錄器分享文嗎