As a reminder, a one-page summary of all the courses, books & videos
I’ve reviewed in the past year can be found on myJourney Roadmap page.

It’s been a summer of incredible transition for me as I’ve made a permanent move from the relatively chilly climate of New York (old house shown to the right) to the equatorial heat misery of South Carolina. I can only hope that this investment pays off in the winter when I’m enjoying a balmy 50-degree day while the Northeast shovels out of a blizzard.
I’ve not posted an “Accomplishments” blog sinceMay, but that certainly shouldn’t indicate that I’ve not been pursuing Data Science over the summer. Far from it! Although I hadn’t completed any new courses or books in June and July, when I wasn’t busy packing up or tossing out all of my life’s possessions, I took advantage of the time to revisit a lot of the topics I’d covered in the past year. I began creating hundreds ofMnemosyne flashcards to sharpen my skillset. I retook the UoW Machine Learning: Regression Course , going over all code examples in painstaking detail. I also re-read every word of “ An Introduction to Statistical Learning with Applications in R ”, working through all of R labs and exercises, incorporating sample code into my Mnemosyne card set. It was an absolutely necessary activity, and I feel much stronger as a result. Consider revisiting some old courses you’ve taken you’d be surprised that you can still get something new from them with multiple tries.
August, however, with the move complete, a number of endeavors also came to a successful close.
Completed Items Coursera Machine Learning: Clustering and RetrievalThis is the fourth course in the University of Washington Machine Learning Specialization on Coursera. Grouping and association were the theme here. Diving into large datasets of Wikipedia article entries, we found commonality between groups of articles, implemented various measures of “alikeness”, assigned articles to topics based on word groupings and made predictions on new articles based on models build from large training sets.
By far, this was the most challenging course of the series to date. It covered a number of topics I’ve seen before, such as Nearest-Neighbor searches, k-means Clustering and dendrograms. But they also went into depth on a number of topics that I’d not seen before. These included

Locality sensitive hashing for approximate NN search KD trees Mixed Gaussian models The EM algorithm Latent Dirichlet allocation and Gibbs sampling
There was a lot of ground covered and the message boards did reflect some frustration with how much content was packed into each week. From my point of view, for $79, the more content the better! I honestly feel like I grasped about 80% of some of the advanced concepts, but I do feel like I was left with a solid foundation from which to pursue further inquiry.

The assignments, as usual, were self-guided ipython notebooks that walked you through the process of implementing a number of the mathematical algorithms with Python code. If taken slowly enough and with constant reference to notes taken during the lectures, they were achievable and added to the education experience.
I will observe that questions posted in the forums went unanswered far more frequently than in previous courses. While there were fellow students who were incredibly helpful, you could find yourself on your own if you’re seeing assistance on a question.
Finally, there was a bit of drama during this course in that the company founded by one of the course creators Turi (formerly Dato, formerly GraphLab) was acquired by Apple . It was questioned whether the primary product used in this course, Graphlab Create, would remain available to students or whether the professors themselves would even deliver on the remaining two courses (Recommender Systems & Dimension Reduction as well as the Capstone Project). If I sold a company off to Apple, I might just take my bags of cash to Tahiti and live out my life sipping Banana Daiquiris.

edX Berkeley U CS105x Introduction to Apache Spark
My studies in the past year have followed 4 different tracks: R, Python, Mathematics and the newcomer, Apache Spark. I initially was going to stay focused on the data analysis and algorithm techniques and not dive into the Big Data world (which is its own complementary specialty). However, out of curiosity, I took the short Udemy course, Taming Big Data with Apache Spark and Python back in May and found myself pleasantly surprised at how much I was intrigued by this technology.
The University of California, Berkeley has developed their own edX trilogy around this platform called Data Science and Engineering with Apache Spark . They’ve partnered with Databricks to bring students free access to Spark servers hosted on Amazon Web Services for a year.
The first course I completed was CS105x Introduction to Apache Spark . There are two parts to this 3-week course (although you are given 6 weeks to complete all the material at your own pace):

Short video lectures are given by Dr. Anthony D. Joseph, who has the most awkward teleprompter reading style. He pretty much reads the slides to you and doesn’t add much from his personal appearance. The content is very basic, and served as a nice counterbalance to the brain bruiser UoW Machine Learning course I was taking at the same time. You have to answer quiz questions based on the lectures, and they seemed more based on trivia contained in the lecture than on any real core concept.

Fortunately, these videos were only a small part of what was offered. The true value were in the self-paced iPython Notebook labs. These labs are where the real learning took place.
They truly guide you from basic Spark commands (for which you needed some familiarity with Python), small working examples processing through the entire works of Shakespeare to the final lab where you analyzed a month’s worth of NASA web logs.
I found these labs to be very clear with a straightforward progression of skillsets. By the end, you are cut free to perform your own analysis on a large dataset.
This was one of the few MOOC’s I’ve encountered where there was very active participation from t