Building Decision Tree Algorithm in Python with scikit learn

Decision tree algorithm in python Decision Tree Algorithm implementation with scikit learn

One of the cutest and lovable supervised algorithms is Decision Tree Algorithm. It can be used for both the classification as well as regression purposes also.

As in the previous article how the decision tree algorithm works we have given the enough introduction to the working aspects of decision tree algorithm. In this article, we are going to build a decision tree classifier in python using scikit-learn machine learning packages for balance scale dataset.

The summarizingway of addressing this article is to explain how we can implement Decision Tree classifier on Balance scale data set.We will program our classifier in Python language and will use its sklearn library .

How we can implement Decision Tree classifier in Python with Scikit-learn

Decision tree algorithm prerequisites

Before get start building the decision tree classifier in Python, please gain enough knowledge on how the decision tree algorithm works. If you don’t have the basic understanding of how the Decision Tree algorithm. You can spend some time on how the Decision TreeAlgorithm works article.

Once we completed modeling the Decision Tree classifier, we will use the trained model to predict whether the balance scale tip to the right or tip to the left or be balanced . The greatness of using Sklearn is that. It provides the functionality to implement machine learning algorithms in a few lines of code.

Before get started let’s quicklylook into the assumptions we make while creating the decision tree and the decision tree algorithm pseudocode.

Assumptions we make while using Decision tree In the beginning, thewhole training set is considered at the root. Feature values are preferred to be categorical. If values are continuous then they are discretized prior to building the model. Records are distributed recursivelyon the basis of attribute values. Order to placing attributes as root or internal node of thetree is done by using some statistical approach. Decision Tree Algorithm Pseudocode Place the best attribute of our dataset at the root of the tree. Splitthe training set into subsets. Subsets should be made in such a way that each subset contains data with the same value for an attribute. Repeat step 1 and step 2 on each subset until you find leaf nodes in all the branches of the tree.

While building our decision tree classifier,we can improve its accuracy bytuning it with different parameters. But this tuning should bedone carefullysince by doing this our algorithm can overfit on our training data & ultimately it will build bad generalization model.

Sklearn Library Installation

Python’s sklearn library holds tons of modules that help to build predictive models. It contains tools for data splitting, pre-processing, feature selection, tuning and supervised unsupervised learning algorithms, etc. It is similar to Caretlibrary in R programming.

For using it, we first need to install it. The best way to install data science libraries and its dependencies is by installing Anaconda package. You can also install only the most popular machine learning Python libraries .

Sklearn library provides us direct access to a different module for training our model with different machine learning algorithms like K-nearest neighbor classifier , Support vector machine classifier , decision tree,linear regression, etc.

Balance Scale Data Set Description

Balance Scale data set consists of 5 attributes, 4 as feature attributes and 1 as thetarget attribute. We will try to build aclassifier for predicting the Class attribute. The index of target attribute is 1st.

1.: 3 (L, B, R)

2. Left-Weight: 5 (1, 2, 3, 4, 5)

3. Left-Distance: 5 (1, 2, 3, 4, 5)

4. Right-Weight: 5 (1, 2, 3, 4, 5)

5. Right-Distance: 5 (1, 2, 3, 4, 5)

Index Variable Name Variable Values 1. Class Name( Target Variable) “R” : balance scale tip to the right
“L” :balance scale tip to the left
“B” : balance scale be balanced 2. Left-Weight 1, 2, 3, 4, 5 3. Left-Distance 1, 2, 3, 4, 5 4. Right-Weight 1, 2, 3, 4, 5 5. Right-Distance 1, 2, 3, 4, 5

The above table shows all the details of data.

Balance Scale Problem Statement

The problem we are going to address is To model a classifier forevaluating balance tip’s direction.

Decision Tree classifier implementation in Python with sklearnLibrary

The modeled Decision Tree will compare the new records metrics with the prior records(training data) that correctly classifiedthe balance scale’s tip direction.

Python packages used NumPy NumPy is a Numeric Python module. It provides fast mathematical functions. Numpyprovides robust data structures for efficient computation of multi-dimensional arrays & matrices. We used numpy to read data files into numpy arrays and data manipulation. Pandas Provides DataFrame Object for data manipulation Provides reading & writing data b/w different files. DataFrames can hold different types data of multidimensional arrays. Scikit-Learn It’s a machine learning library. It includes various machine learning algorithms. We are using its train_test_split, DecisionTreeClassifier, accuracy_score algorithms.

If you haven’t setup the machine learning setup in your system the below posts will helpful.

Python Machine learning setup in ubuntu

Building Decision Tree Algorithm in Python with scikit learn

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本