In a post I published yesterday, I argued that it is important for students of machine learning to understand the algorithms and underlying mathematics prior to using tools or libraries that black box the code. I suggested that to do so is likely to result in a lot of “time-wasting confusion” due to students not having the necessary understanding to configure parameters or interpret results. One of the examples I provided for the opposing view was this blog post from BigML, which argues that beginners don’t need courses such as those provided by Coursera if they use their tool.
Francisco J. Martin, CEO of Big ML, has tweeted in response.
So Kids shouldn’t avoid assembler, automata, and compilers when learning to code?
This is a very good question and one that grants us an opportunity to dig deeper into the issue. I am responding here because I don’t believe it’s a question I can answer in 140 characters.
The short answer is no, I’m perfectly ok with beginner programmers starting out in high-level languages and working their way down, or even stopping there and not working their way down. But this is not analagous to machine learning.
I see three big differences.
First of all, learning a high-level language is actually a constructive step towards learning lower level languages. If that’s the goal, and you started with something like Java, you could potentially learn quite a lot about programming in general. Then trying C++ would help to fill in blanks with resect to some of the aspects of programming the Java glosses over. Likewise, Assembler could take you a step further.
If playing with the parameters of black-boxed algorithms offers a path at all towards becoming proficient at machine learning, it’s an incredibly innefficient one. It’s an awfully big search space to approach by trial and error when you consider the combinations of parameters, feature selection and the question of whether you have enough or appropriate data examples.
The second difference is that to do high-level programming does not require an understanding of low-level programming. I can do anything that Java or c# will let me do without knowing anything about assembly language. In comparison, a machine learning tool requires me to know how to set appropriate values of parameters that are parsed into the hidden algorithms. They also require me to understand whether or not I have an appropriate (representative) dataset with appropriate features. Then when it finishes I need to be able to interpret the results and take appropriate actions. Better outcomes come from more informed decisions.
The third difference relates to the potential benefits of exploring the low-level languages. There are some exceptions to this, but generally speaking, writing more efficient algorithms in low-level languages comes at such great expense in comparison to the constantly falling cost of computation, that it just isn’t worthwhile.
In my last post I cited Kaggle’s chief scientist, Jeremy Howard, who said there was a massive difference in capability between good and average data scientists. I take this to indicate that in machine learning, more knowledge leads to exponentially better outcomes. Unlike low-level programming, there is a huge benefit to having a detailed knowledge of machine learning.
I have come across some arguments suggesting that as Moore’s law reaches its limit, low-level coding will become much more sought after. If that happens I’ll revisit my position on low-level coding, but for now I’m betting that specialist processors like GPUs will help to bridge the gap before the next paradigm of computation comes along to keep the gravy train of exponential price-performance improvement going.