Data Mining Coursera for Popular Courses with Python

In a few years, massive open online courses, most commonly referred to as MOOCs, have exploded in popularity. These online courses provide high quality content taught by the best professors in their respective fields. With more that 20 million students, Coursera is one of the leaders of the MOOC movement. Coursera provides a platform that connects leading universities with students worldwide. I took my first Coursera course back in 2012, and since then I try to take one course a month.

In 2015, Coursera has around 1000 courses offered by 100+ universities in more than 10 languages. With this huge number of courses, it became difficult to decide on what course to take.

In this tutorial, I will combine coursera data together with social media data to assess the popularity of courses. To to this, I will use the Coursera API to retrieve the course catalogue, I will use the sharecount.com API to get social media metrics for each course, and I will use python's pandas library to query and order the courses by popularity.

Data Mining Coursera for Popular Courses with Python

The source code for this tutorial can be found in this github repository .

1. Getting courses data

Coursera provides an API for accessing the courses, universities, categories, instructors and sessions data. For this tutorial, we will be using the courses, universities and categories data.

in a Python shell, we start by importing the 3 following libraries: urllib2 , json , and pandas . Both urllib2 and json libraries are part of the core python libraries and don't need to be installed separately.

In[1]:

import urllib2 import json import pandas as pd

Next, we access the coursera API and download the course catalogue. For each course, we are interested in 3 fields:

shortName : The short name associated with the course. name : The course name or title. language : The language code for the course. (e.g. 'en' means English.)

We also include in the query universities and categories parameters. This will return the ids that matches each course with their corresponding universities and categories. Below are the Python commands to do so.

In[2]: courses_response = urllib2.urlopen('https://api.coursera.org/api/catalog.v1/courses?fields=shortName,name,language&includes=universities,categories') courses_data = json.load(courses_response) courses_data = courses_data['elements']

If we want to get the data about the first course in the courses_data dictionary, we simply execute the command below.

In[3]: courses_data[0] Out[3]: {u'id': 2163, u'language': u'en', u'links': {u'categories': [8, 10, 19, 20], u'universities': [65]}, u'name': u'The Land Ethic Reclaimed: Perceptive Hunting, Aldo Leopold, and Conservation', u'shortName': u'perceptivehunting'}

The first course in the courses_data dictionary is 'The Land Ethic Reclaimed: Perceptive Hunting, Aldo Leopold, and Conservation', offered in English by the university with id=65, and under categories 8, 10, 19 and 20.

Next, we retrieve the universities and categories data from the Coursera API. For the universities data, we are interested in the university name and its location.

In[4]: universities_response = urllib2.urlopen('https://api.coursera.org/api/catalog.v1/universities?fields=name,locationCountry') universities_data = json.load(universities_response) universities_data = universities_data['elements']

We can get the data about the first university from universities_data by executing the following command:

In[5]: universities_data[0] Out[5]:

{u'id': 234, u'links': {}, u'locationCountry': u'CN', u'name': u"Xi'an Jiaotong University", u'shortName': u'xjtu'}

Similarly, we can get the courses categories data by executing the following commands.

In[6]: categories_response = urllib2.urlopen('https://api.coursera.org/api/catalog.v1/categories') categories_data = json.load(categories_response) categories_data = categories_data['elements']

We can get the data about the the first category from categories_data by executing the following command:

In[7]: categories_data[0] Out[7]:

{u'id': 5, u'links': {}, u'name': u'Mathematics', u'shortName': u'math'}

2. Structuring the data

In this section, we will structure the courses_data , universities_data and categories_data into pandas DataFrames, and we will map the universities and categories ids with the corresponding names. By the end of this section, we will have one pandas DataFrame called courses_df that will have all the necessary data in a well structured format.

2.1. Putting the data into Pandas DataFrames

First, we start by creating a pandas DataFrame for the courses data.

In[8]:

courses_df = pd.DataFrame()

Next, we add the course_name , course_language , course_short_name , categories and universities columns to the courses_df DataFrame.

In[9]: courses_df['course_name'] = map(lambda course_data: course_data['name'], courses_data) courses_df['course_language'] = map(lambda course_data: course_data['language'], courses_data) courses_df['course_short_name'] = map(lambda course_data: course_data['shortName'], courses_data) courses_df['categories'] = map(lambda course_data: course_data['links']['categories'] if 'categories' in course_data['links'] else [], courses_data) courses_df['universities'] = map(lambda course_data: course_data['links']['universities'] if 'universities' in course_data['links'] else [], courses_data)

We can print the first 5 rows from the courses_df DataFrame by executing the command below.

In[10]:

courses_df.head()

Out[10]: course_name course_language course_short_name categories universities 0 The Land Ethic Reclaimed: Perceptive Hunting, ... en perceptivehunting [8, 10, 19, 20] [65] 1 Contraception: Choices, Culture and Consequences en contraception [3, 8] [10] 2

Data Mining Coursera for Popular Courses with Python

Trending Articles

瓶男消失十天，又出现了 (豆瓣我爱我恨水瓶男小组)

出售: Tag Mclaren P60 (mide in uk)

【公告】新竹線交通車將行駛至101年6月30日止

mp3DirectCut 2.39 免安裝中文版 - MP3切割軟體音樂剪輯軟體

[閒聊] 新竹湖口N2優質網咖

Autodesk MAYA 2026中文特别版

請問11.5Altis z版會出嗎?

【3.8.X】请教一个关于多节点同步动画的问题

具身智能创企“维他动力”完成天使轮融资

想看迪斯科与核战争

六甲級任導師顏后笙老師

曾智希写真集12.1预购首次挑战全裸「浴照」

《沈冰自述——我和周永康的故事》全本

[GM-Team][国漫][大主宰][The Great Ruler][2023][44][AVC][GB][1080P]

搞笑麻将漫画「3年B组一八先生」被网友吐槽“杀人麻将”？！

关门一家亲：习远平、张澜澜、徐才厚

中软国际中期业绩喜人，归属于母公司净利同比大增69%

免费翻墙节点大全

台南火車站周邊店面地坪價約130~170萬元

【台積電IT卓越新戰略5】台積IT組織5年三次大調整，要靠平臺工程讓DevOps創新再加速