In this post I consider three learning approaches and argue that it could be a bad idea to avoid the mathematics and theory when starting out with machine learning.
There are three approaches to starting out in machine learning that I have seen practiced. One is a bottom-up approach, in which the student starts with the mathematics and theory and then puts it into practice in either a high-level programming language — such as Matlab, Python, R or Octave — or by coding from scratch in a 3GL like Java, C# or C++. The second is the top-down approach, in which machine learning tools and/or libraries are used to shelter the student from the coding, mathematics and theory. S/he is instructed to worry about how it all works later and to instead practice working with datasets. The third is more of a mixed approach, where the student works with tools and/or libraries, but is also thoroughly instructed in the theory so that the student understands how the algorithms work and what to expect from them from the outset.
My opinion on this issue is very much swayed by something Kaggle’s chief scientist, Jeremy Howard, had to say a couple of years ago. Kaggle, for the uninitiated, is an organisation that runs machine learning competitions, some of which offer monetary rewards. If anyone has a finger on the pulse of the machine learning and data science talent pool it’s Jeremy Howard. In a panel discussion hosted in July 2013 by the Churchill Club, Howard made the following statement:
We’ve measured the performance of our 100,000 users as they enter predictive modelling competitions to see who is the best at making accurate predictions… I know most of the guys at the top of that community, [and] the top 10 do stuff that the top 100 can’t dream of, the top 100 do stuff the top 1000 can’t dream of, and the top 1000 do stuff 100 times faster than the top 10000 can. There is this massive curve of capability and speed. There is very little [discussion] around at the moment of how we train the next generation of those top 10, how do we identify them… of our competition, nearly all of our winners from the past year actually learnt about machine learning by watching Andrew Ng’s you tube lectures and Coursera lectures. Literally the best place to be trained right now is in online courses.
I want to make this point very clear. The guy responsible for evaluating the performance of one-hundred-thousand data scientists says there is a massive difference in capability between the best and average developers in the field, and that nearly all of the best developers he is aware of learned their trade from Andrew Ng.
So who is Andrew Ng? Ng (pronounced Ang) is an assistant professor at Stanford university, chief engineer at Baidu, co-founder of Coursera and the former head of the Google brain project. Ng is very passionate about machine learning and wants to share his knowledge with the world. Most importantly to this discussion, his teaching method is very much bottom up. He starts with the theory and mathematics and teaches how to code the algorithms in high-level programming languages, usually GNU Octave.
Edit: 19/05/2017 – Note: That the bottom-up approach may be responsible for the success of those Kaggle competitors is my suspicion and not one that is shared by Jeremy Howard. See the post edit below this article for Howard’s view of this.
It really shouldn’t be surprising that developers who understand the ins and outs of the algorithms they are working with perform a lot better than those who don’t. In the implementation of machine learning algorithms, much of the workload in practice lies in the very many decisions that have to be made at every stage of the development process. We all operate with limited resources and when it comes to machine learning, poor decisions can be very costly in terms of time and money. A poor decision could result in months of gathering data that isn’t needed, or is otherwise unfit for purpose. It could result in time wasted using sub-optimal parameters, poor feature selection or badly pre-processed data. It could result in inappropriate algorithms being selected in the first place or models that don’t generalise to real-world unseen data examples. It could result in inappropriate courses of action in response to failures. In the real world an understanding of the algorithms you are working with leads to better decisions, less waste and much better outcomes. Understanding the algorithms is of tantamount importance in practice, in industry, not just in academia. This understanding is provided from the outset in both the bottom-up and mixed approaches. It is not provided early or efficiently by the top-down method.
The top-down approach seems to be seductive mainly because the math avoidance makes it seem like an easier route to take. The student can ignore the mathematics and just dive into experimenting with data sets. But I know from my own experience that any time I have been tempted to jump in and start playing with algorithms that I don’t yet fully understand, it has led to nothing more than a lot of unnecessary, time-wasting confusion. It’s simply not an effective means by which to learn the trade.
Nonetheless, there are those who advocate for a top-down approach. The BigML blog argues that with their — admittedly very nice looking — machine learning tool, we “don’t need Coursera to get started with machine learning“. Another example is Jason Brownlee’s machine learning mastery blog, which suggests a four-step top-down plan, and techniques for learning machine learning without mathematics, using WEKA.
In fairness to Brownlee, he does emphasise elsewhere the need for a developer to understand the algorithms in order to become proficient and has even published a book designed to help software developers to understand the math and theory underlying the algorithms. However, he seems to view this as something that can occur late in the learning process. I’ve also noticed a tendency to overstate the problem of learning the math in statements like the following taken from a form an email sent out to his subscribers (among whom I am counted):
A sticking point with machine learning is the math. You want to dive into the details of machine learning algorithms but you don’t want to spend the next 3 years studying advanced mathematics.
I honestly can’t see statements such as this having any effect on beginners, who don’t have a strong mathematical background, other than increasing their — already far too prevalent — math anxiety. The idea that anyone would need to spend three years studying advanced mathematics to understand the math used by machine learning algorithms is ludicrous. In reality, it is all quite straight forward. Andrew Ng’s Coursera course teaches a large proportion of it in ten weeks. Every week the student learns the theory underlying a particular machine learning technique and then applies it by writing code. You work with concrete examples, not just abstract math, and you come away from it with a thorough understanding of each topic. The math does not have to be completely abstract, especially when the entire purpose of learning it is to apply it to a concrete area of application.
Machine learning tools are not necessarily counter-productive. WEKA, for instance, is a powerful tool for which the work of the developer lies in providing an appropriate data set and features, choosing which algorithm(s) to use, tuning parameter values, running the algorithms and reviewing the results. The absence of any need to code the algorithms is very handy, especially since it opens the developer up to trying out a whole range of approaches, something that wouldn’t be practical if s/he had to code them all from scratch. But it is only useful if you already know how to configure the system and interpret the results.
Just to reflect on my own experience, I started out with Andrew Ng’s Coursera course and later covered additional aspects of machine learning in my MSc study, in which a mixed approach was taken, involving a combination of WEKA, theory (expressed in math) and coding machine learning algorithms from scratch in Java. I do think there is one thing that could have streamlined the learning process for me in both cases, and that is the inclusion of more of the theory expressed in a familiar programming language alongside the math. In my view, that is all that is needed to bridge the gap in understanding for those developers spooked by linear algebra, calculus, probability and statistics.
In summary, there is a massive disparity in the capabilities of machine learning developers and I strongly suspect this is at least partly due to the top-down approach to learning the topic. Understanding the math is important if you really want to master machine learning. Rather than avoid it, lets get some concrete coding examples out there and use them as a means to help industry developers to understand the theory.
At the time of writing, it is still free to sign up to Andrew Ng’s Coursera course and I’d recommend it to anyone starting out. You can also find his Stanford Lectures here:
Post Edit: May 19, 2017
Jeremy Howard was in contact on Twitter to say he does not support the view that Math and Theory should come first, suggesting the best Kagglers were code-first.
Though, I am not sure how that fits with the 2013 quote above, stating that the competitors he was referring to started out with Andrew Ng’s course. Ng’s course begins with the mathematics and theory prior to proceeding to the code on each topic.
Howard is offering his own free 7-week course entitled Practical Deep Learning For Coders, which I’ll definitely be enrolling in once my MSc study concludes later this year.