Recommender Systems using Deep Learning in PyTorch from scratch

Photo by Susan Yin on Unsplash

Recommender systems (RS) have been around for a long time, and recent advances in deep learning have made them even more exciting. Matrix factorization algorithms have been the workhorse of RS. In this article, I would assume that you are vaguely familiar with collaborative filtering based methods and have basic knowledge about training a neural network in PyTorch.

In this post, my goal is to show you how to implement a RS in PyTorch from scratch. The theory and model presented in this article were made available in this paper . Here is the GitHub repository for this article.

Problem Definition

Given a past record of movies seen by a user, we will build a recommender system that helps the user discover movies of their interest.

Specifically, given <userID, itemID> occurrence pairs, we need to generate a ranked list of movies for each user.

We model the problem as a binary classification problem , where we learn a function to predict whether a particular user will like a particular movie or not.

Our model will learn thismapping Dataset

We use the MovieLens 100K dataset, which has 100,000 ratings from 1000 users on 1700 movies. The dataset can be downloaded from here .

The ratings are given to us in form of <userID,itemID, rating, timestamp> tuples. Each user has a minimum of 20 ratings.

Training

We drop the exact value of rating (1,2,3,4,5) and instead convert it to an implicit scenario i.e. any positive interaction is given value of 1. All other interactions are given a value of zero, by default.

Since we are training a classifier, we need both positive and negative samples. The records present in the dataset are counted as positive samples. We assume that all entries in the user-item interaction matrix are negative samples (a strong assumption, and easy to implement).

We randomly sample 4 items that are not interacted by the user, for every item interacted by the user. This way, if a user has 20 positive interactions, he will have 80 negative interactions. These negative interactions cannot contain any positive interaction by the user, though they may not be all unique due to random sampling.

Evaluation

We randomly sample 100 items that are not interacted by the user, ranking the test item among the 100 items. This same strategy is used in the paper, which is the inspiration for this post (referenced below). We truncate the ranked list at 10.

Since it is too time-consuming to rank all items for every user, for we will have to calculate 1000*1700 ~10 values. With this strategy, we need 1000*100 ~ 10 values, an order of magnitude less.

For each user, we use the latest rating(according to timestamp) in the test set, and we use the rest for training. This evaluation methodology is also known as leave-one-out strategy and is the same as used in the reference paper.

Metrics

We use Hit Ratio(HR), and Normalized Discounted Cumulative Gain(NDCG) to evaluate the performance for our RS.

Our model gives a confidence score between 0 and 1 for each item present in the test set for a given user. The items are sorted in decreasing order of their score, and top 10 items are given as recommendation. If the test item (which is only one for each user) is present in this list, HR is one for this user, else it is zero. The final HR is reported after averaging for all users. A similar calculation is done for NDCG.

While training, we will be minimizing the cross-entropy loss, which is the standard loss function for a classification problem. The real strength of RS lies in giving a ranked list of top-k items, which a user is most likely to interact. Think about why you mostly click on google search results only on the first page, and never go to other pages. Metrics like NDCG and HR help in capturing this phenomenon by indicating the quality of our ranked lists. Here is a good introduction on evaluating recommender systems .

Baseline: Item Popularity model

A baseline model is one we use to provide a first cut, easy, non-sophisticated solution to the problem. In much of use cases for recommender systems, recommending the same list of most popular items to all users gives a tough to beat baseline.

In the GitHub repository, you will also find the code for implementing item popularity model from scratch. Below are the results for the baseline model.

Deep Learning basedmodel

With all the fancy architecture and jargon of neural networks, we aim to beat this item popularity model.

Our next model is a deep multi-layer perceptron (MLP). The input to the model is userID and itemID, which is fed into an embedding layer. Thus, each user and item is given an embedding. There are multiple dense layers afterward, followed by a single neuron with a sigmoid activation. The exact model definition can be found in the file MLP.py .

The output of the sigmoid neuron can be interpreted as the probability the user is likely to interact with an item. It is interesting to observe that we end up training a classifier for the task of recommendation.

Figure 2: The architecture for Neural Collaborative Filtering

Our loss function is Binary Cross-entropy loss. We use Adam for gradient descent and L-2 norm for regularization.

Results

For the popularity based model, which takes less than 5 seconds to train, these are the scores:

HR = 0.4221 | NDCG = 0.2269

For the deep learning model, we obtain these results after nearly 30 epochs of training (~3 minutes on CPU):

HR = 0.6013 | NDCG = 0.3294

The results are exciting. There is a huge jump in metrics we care about. We observe a 30% reduction in error according to HR, which is huge. These numbers are obtained from a very coarse hyper-parameter tuning. It might still be possible to extract more juice by hyper-parameter optimization.

Conclusion

State of the art algorithms for matrix factorization, and much more, can be easily replicated using neural networks. For a non-neural perspective, read this excellent post about matrix factorization for recommender systems .

In this post, we saw how neural networks offer a straightforward way of building recommender systems. The trick is to think of recommendation problem as a classification prob

Recommender Systems using Deep Learning in PyTorch from scratch

Trending Articles

[創作] 1.14單手狂怒狼心得

台灣評鑑協會辦座談會為技職教育永續獻策

[分享]LISP計算多個不同形體的面積

47岁周海媚早年露点激情床戏曝光（图）

出售: Audionote AN-Vx31 XLR balance cable 1.5m

关门一家亲：习远平、张澜澜、徐才厚

英特尔8、9代cpu装win7的集显驱动

分享安卓導航鼎微方案 TS9 sp9853i 更新檔

看过政客之夫

读过学会跨越数字鸿沟

2017春季“通商青岛品牌之都”全球营销推介会在广州成功举办

出售: ALTEC AL-PC6530Q 分離式三音路喇叭

CopyQ 3.10.0 免安裝中文版 - 免費又好用的剪貼簿軟體

越坂康史days系列：48天島村舞花、58天树花凛、38天范田纱纱、39天杏堂怜，耻辱中的情欲，暂缺68天鳴海小春！

【日语无字】春之钟.Haru.no.kane.1985.JAP.vhsrip.NoSub.by.xiongzaixia&vivi

《沈冰自述——我和周永康的故事》全本

新东方并购斯芬克国际艺术教育，国际教育今年被频频看好

Navicat Premium v17.1.5

想读夜航西飞

「樂得耆所」屋邨長者會所