Accomplishments Aug 2016

As a reminder, a one-page summary of all the courses, books & videos

I’ve reviewed in the past year can be found on myJourney Roadmap page.

It’s been a summer of incredible transition for me as I’ve made a permanent move from the relatively chilly climate of New York (old house shown to the right) to the equatorial heat misery of South Carolina. I can only hope that this investment pays off in the winter when I’m enjoying a balmy 50-degree day while the Northeast shovels out of a blizzard.

I’ve not posted an “Accomplishments” blog sinceMay, but that certainly shouldn’t indicate that I’ve not been pursuing Data Science over the summer. Far from it! Although I hadn’t completed any new courses or books in June and July, when I wasn’t busy packing up or tossing out all of my life’s possessions, I took advantage of the time to revisit a lot of the topics I’d covered in the past year. I began creating hundreds ofMnemosyne flashcards to sharpen my skillset. I retook the UoW Machine Learning: Regression Course , going over all code examples in painstaking detail. I also re-read every word of “ An Introduction to Statistical Learning with Applications in R ”, working through all of R labs and exercises, incorporating sample code into my Mnemosyne card set. It was an absolutely necessary activity, and I feel much stronger as a result. Consider revisiting some old courses you’ve taken you’d be surprised that you can still get something new from them with multiple tries.

August, however, with the move complete, a number of endeavors also came to a successful close.

Completed Items Coursera Machine Learning: Clustering and Retrieval

This is the fourth course in the University of Washington Machine Learning Specialization on Coursera. Grouping and association were the theme here. Diving into large datasets of Wikipedia article entries, we found commonality between groups of articles, implemented various measures of “alikeness”, assigned articles to topics based on word groupings and made predictions on new articles based on models build from large training sets.

By far, this was the most challenging course of the series to date. It covered a number of topics I’ve seen before, such as Nearest-Neighbor searches, k-means Clustering and dendrograms. But they also went into depth on a number of topics that I’d not seen before. These included

Locality sensitive hashing for approximate NN search KD trees Mixed Gaussian models The EM algorithm Latent Dirichlet allocation and Gibbs sampling

There was a lot of ground covered and the message boards did reflect some frustration with how much content was packed into each week. From my point of view, for $79, the more content the better! I honestly feel like I grasped about 80% of some of the advanced concepts, but I do feel like I was left with a solid foundation from which to pursue further inquiry.

The assignments, as usual, were self-guided ipython notebooks that walked you through the process of implementing a number of the mathematical algorithms with Python code. If taken slowly enough and with constant reference to notes taken during the lectures, they were achievable and added to the education experience.

I will observe that questions posted in the forums went unanswered far more frequently than in previous courses. While there were fellow students who were incredibly helpful, you could find yourself on your own if you’re seeing assistance on a question.

Finally, there was a bit of drama during this course in that the company founded by one of the course creators Turi (formerly Dato, formerly GraphLab) was acquired by Apple . It was questioned whether the primary product used in this course, Graphlab Create, would remain available to students or whether the professors themselves would even deliver on the remaining two courses (Recommender Systems & Dimension Reduction as well as the Capstone Project). If I sold a company off to Apple, I might just take my bags of cash to Tahiti and live out my life sipping Banana Daiquiris.

edX Berkeley U CS105x Introduction to Apache Spark

My studies in the past year have followed 4 different tracks: R, Python, Mathematics and the newcomer, Apache Spark. I initially was going to stay focused on the data analysis and algorithm techniques and not dive into the Big Data world (which is its own complementary specialty). However, out of curiosity, I took the short Udemy course, Taming Big Data with Apache Spark and Python back in May and found myself pleasantly surprised at how much I was intrigued by this technology.

The University of California, Berkeley has developed their own edX trilogy around this platform called Data Science and Engineering with Apache Spark . They’ve partnered with Databricks to bring students free access to Spark servers hosted on Amazon Web Services for a year.

The first course I completed was CS105x Introduction to Apache Spark . There are two parts to this 3-week course (although you are given 6 weeks to complete all the material at your own pace):

Short video lectures are given by Dr. Anthony D. Joseph, who has the most awkward teleprompter reading style. He pretty much reads the slides to you and doesn’t add much from his personal appearance. The content is very basic, and served as a nice counterbalance to the brain bruiser UoW Machine Learning course I was taking at the same time. You have to answer quiz questions based on the lectures, and they seemed more based on trivia contained in the lecture than on any real core concept.
Accomplishments Aug 2016

Fortunately, these videos were only a small part of what was offered. The true value were in the self-paced iPython Notebook labs. These labs are where the real learning took place.

They truly guide you from basic Spark commands (for which you needed some familiarity with Python), small working examples processing through the entire works of Shakespeare to the final lab where you analyzed a month’s worth of NASA web logs.

I found these labs to be very clear with a straightforward progression of skillsets. By the end, you are cut free to perform your own analysis on a large dataset.

This was one of the few MOOC’s I’ve encountered where there was very active participation from t

Accomplishments Aug 2016

Trending Articles

[奇怪机翻组] 双梦相牵 / ふたりの夢もち [RJ01259078] [WebRip] [1080P HEVC-10Bit AAC 2.0]...

HONDA CITY VTI-S 菜單分享

#新闻拍一拍# 新的摩尔定律：黄氏定律

一如既往的痴情能否打动月瓶金蝎？ (豆瓣月亮水瓶小组)

求購按摩椅~'~

「粉红」不是霸凌辜莞允杠部落客：我爽在哪？

Intel 7-10代集成显卡驱动31.0.101.2137完整版

涉Gotbit加密货币市场操纵台男纽约被捕

臺灣法治會計學會2025年第三季研討會

不靠姊姊！張柏芝弟弟開計程車維生

关门一家亲：习远平、张澜澜、徐才厚

剑指offer——24.二叉树中和为某一值的路径

苏珊米勒日晕05.11｜狮子鼓励孩子；处女相信自己 (豆瓣 SUSAN MILLER小组)

【台積電IT卓越新戰略5】台積IT組織5年三次大調整，要靠平臺工程讓DevOps創新再加速

【日语无字】春之钟.Haru.no.kane.1985.JAP.vhsrip.NoSub.by.xiongzaixia&vivi

美籍老公不讓步李愛綺兒子念公立小學

爆杨兰兰对于朦胧一见倾心泄露亲爹习近平致命机密？【阿波罗网报道】

湖州师范学院音乐学院开发的 Kontakt 8 明代魏氏乐琵琶/瑟/月琴音源即将发布

LameXP 4.21.2382 免安裝中文版 - MP3音樂轉檔軟體

免费翻墙节点大全