K-Nearest Neighbor of Lending Club Issued Loans in Python

This was my final project for my Graduate course, Programming Machine Learning Applications at DePaul University .

What is K-Nearest Neighbor?

In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression.

In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. Source: Wikipedia

Analysis process

The original data set was downloaded from Kaggle , as an aggregate of issued loans from Lending Club through 2007-2015. Lending Club is a US peer-to-peer lending company. The original data set contains 887383 rows and 75 columns.

Due to computing power on my Macbook Pro, I choose to reduce (sample) the data to perform the data analysis to 5% of the original. I also choose to perform some pre-processing by removing categorical variables with high cardinality. I also chose to impute NaN values to zero, as the other option of removing select rows with NaN would results in eliminating the entire data set.

Data Acquisition : I loaded the necessary libraries and download the Zip package containing the CSV file from Kaggle. After viewing the data and its shape I took a random 5% of the data to perform the analysis on. Exploratory data analysis : During this step I perform some descriptive analysis and determined the target variable. I also explored how many classes were in the target and a selection of other possibly problematic (high cardinality) variables. I also visualized the target variable in a histogram which is a good technique for understanding the distribution of the data to assist in parameter tuning. Data Cleaning : I dropped those high cardinality variables during this step as a pre-cursor to the pre-processing step. Pre-processing & Transformation : I removed the target variable from the entire data set and transformed the categorical variable into a model matrix with one-hot encoding. This is sometimes the requirements for certain algorithms to process the data in a sparse matrix format. Other statistical software such as R, automates this step when generating models. I imputed the missing values in the data to 0. I scaled the continuous variables using min-max normalization which transforms values onto a scale from 0 to 1 to prevent variables on different scales heavily impacting the coefficients. Data Partition : I partitioned the pre-processed data into a training and test data set. Modeling : I built a k-NN classifier model, using 10 neighbor classes and the euclidean distance. Evaluation : I scored the classifier on unseen test data and calculated the R squared values for both the training and test data. A confusion matrix and classification report were generated as seen below. Results Class precision recall f1-score support Charged Off 0.69 0.12 0.2 663 Current 0.8 0.96 0.88 9101 Default 0 0 0 20 Does not meet the credit policy. Status:Charged Off 0 0 0 7 Does not meet the credit policy. Status:Fully Paid 0 0 0 24 Fully Paid 0.81 0.6 0.69 3069 In Grace Period 0 0 0 95 Issued 0 0 0 147 Late (16-30 days) 0 0 0 39 Late (31-120 days) 0 0 0 146 avg / total 0.77 0.8 0.77 13311 Jupyter Notebook

K-Nearest Neighbor of Lending Club Issued Loans in Python

Trending Articles

SM3268AB 8CE三星量产无法格式化

[下载工具]Think4V utubedown(Youtube高清视频下载工具) v2.1.6 官方版2.1.3

出售: SINE Othello 電源線

博讯｜张磊帮助下，李源潮的儿子被耶鲁录取

FullEventLogView 1.73 免安裝中文版 - 事件檢視器取代工具

同門四角戀？李沛旭喇舌「小郭雪芙」曾智希，蔡淑臻拍完婚紗...怒毀婚

五代RAV4 降車身（機械車位因素）

[攻略] 《魔獸世界》6.2.2 白色魚人蛋再現！來去收編魚人寶寶特基！

jetBrains Product crack 2024 Java based

2013 KUGA 6G轉動方向盤會聽到摳摳摳的異音，有人知道原因嗎?

【豌豆字幕組】[藥屋少女的呢喃（藥師少女的獨語）/ Kusuriya no Hitorigoto][25][繁體][1080P][MP4]

好用的照片后期处理软件【DxO PhotoLab Elite 5.4.0.4765 (x64) 多语言便携版】..

出售: Thixar Silence Plus 啫喱板

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

三條崙討海人故事…重建烏倉寮憶43年前船難

致喬立建設道歉聲明

[一般] 神州全地圖掉寶資料

方易通7862 8/128G 無360 刷機

動感校園小記者・瑪利諾修院學校｜採訪王瑋駿陳晞文帶領試玩風帆

有藍電流行車紀錄器分享文嗎