Discussion: Why Machine Learning Beginners Shouldn’t Avoid the Math

In a post I published yesterday, I argued that it is important for students of machine learning to understand the algorithms and underlying mathematics prior to using tools or libraries that black box the code. I suggested that to do so is likely to result in a lot of “time-wasting confusion” due to students not having the necessary understanding to configure parameters or interpret results. One of the examples I provided for the opposing view was this blog post from BigML, which argues that beginners don’t need courses such as those provided by Coursera if they use their tool.

Francisco J. Martin, CEO of Big ML, has tweeted in response.

FJM

So Kids shouldn’t avoid assembler, automata, and compilers when learning to code?

This is a very good question and one that grants us an opportunity to dig deeper into the issue. I am responding here because I don’t believe it’s a question I can answer in 140 characters.

The short answer is no, I’m perfectly ok with beginner programmers starting out in high-level languages and working their way down, or even stopping there and not working their way down. But this is not analagous to machine learning.

I see three big differences.

First of all, learning a high-level language is actually a constructive step towards learning lower level languages. If that’s the goal, and you started with something like Java, you could potentially learn quite a lot about programming in general. Then trying C++ would help to fill in blanks with resect to some of the aspects of programming the Java glosses over. Likewise, Assembler could take you a step further.

If playing with the parameters of black-boxed algorithms offers a path at all towards becoming proficient at machine learning, it’s an incredibly innefficient one. It’s an awfully big search space to approach by trial and error when you consider the combinations of parameters, feature selection and the question of whether you have enough or appropriate data examples.

The second difference is that to do high-level programming does not require an understanding of low-level programming. I can do anything that Java or c# will let me do without knowing anything about assembly language.  In comparison, a machine learning tool requires me to know how to set appropriate values of parameters that are parsed into the hidden algorithms. They also require me to understand whether or not I have an appropriate (representative) dataset with appropriate features. Then when it finishes I need to be able to interpret the results and take appropriate actions. Better outcomes come from more informed decisions.

The third difference relates to the potential benefits of exploring the low-level languages. There are some exceptions to this, but generally speaking, writing more efficient algorithms in low-level languages comes at such great expense in comparison to the constantly falling cost of computation, that it just isn’t worthwhile.

In my last post I cited Kaggle’s chief scientist, Jeremy Howard, who said there was a massive difference in capability between good and average data scientists. I take this to indicate that in machine learning, more knowledge leads to exponentially better outcomes.  Unlike low-level programming, there is a huge benefit to having a detailed knowledge of machine learning.

I have come across some arguments suggesting that as Moore’s law reaches its limit, low-level coding will become much more sought after. If that happens I’ll revisit my position on low-level coding, but for now I’m betting that specialist processors like GPUs will help to bridge the gap before the next paradigm of computation comes along to keep the gravy train of exponential price-performance improvement going.

The Self-reinforcing Myth of Hard-wired Math Inability

There is a commonly held belief that some people have brains that are pre-wired for mathematical excellence, while everyone else is doomed to struggle with the subject. This toxic myth needs to be put deep in the ground and buried in molten lead. It is as destructive as it is self-fulfilling.

The myth equally encourages people who are good at math to falsely believe (Murayama et al., 2012) they are more intelligent than those who are not, and leaves everyone else inclined to believe they can never improve. This is despite the fact that math ability has very little to do with intelligence (Blair et al., 2007).

The reason this myth exists is well understood. School students who were well prepared by their parents in math prior to starting school find themselves separated in ability from their classmates who were not. The latter group consider the seemingly unachievable abilities of their peers and quickly lose confidence in their own abilities. Once that self-confidence is lost, any attempt at completing a math problem leads to math anxiety (Ashcraft et al., 2002; Devine et al., 2012), where thoughts of self-doubt cloud the mind and make it difficult to concentrate on the task at hand.

Mathematics, like computer programming, is a discipline that requires concentration. The student needs to be able to follow a train of thought where A leads to B leads to C etc. A student who lacks self-confidence struggles to maintain the necessary train of thought due to being repeatedly interrupted by negative thoughts about their abilities.  This results in poor performance and reinforces the idea that they are incapable of learning the subject.

It is interesting to see this belief so prevalent among software developers who are perfectly capable of writing an algorithm in a programming language, but suddenly feel that it is impossible to grasp the same algorithm represented by a set of mathematical symbols. There is simply no reason that this should be the case. I’ve yet to meet an experienced programmer who would tell me they find it near-impossible to learn the syntax of a new programming language and yet that is precisely what is entailed in learning how to express an algorithm using linear algebra.

A common point of confusion for many who haven’t done a lot of math since secondary school is in the use of mathematics as a language rather than a set of equations to be solved. In academic computer science, linear algebra, as it is used to express algorithms, is not something to be solved, but rather a language used to describe an algorithm.

Understanding the language of academic computer science is becoming increasingly important as the traditional staples of academia, such as machine learning, increasingly find use in industry.  After all, even if a software developer manages to avoid the math in their work, how can they expect to keep up with the latest developments in this fast-moving field without an ability to understand the academic literature?  Yet this is precisely what some software developers are attempting to do.

Math inability is not hard wired and software developers are already well practiced in the mental skills required.  We use the skill of stepping through a problem and visualising the state changes that occur at each step, every time we read or write a piece of code.  Anyone who can do that is capable of becoming proficient enough in mathematics to understand the mathematical components of the computer science literature.

References

Ashcraft, M. H. (2002). Math anxiety: Personal, educational, and cognitive consequences. Current directions in psychological science, 11(5), 181-185.

Blair, C., & Razza, R. P. (2007). Relating effortful control, executive function, and false belief understanding to emerging math and literacy ability in kindergarten. Child development, 78(2), 647-663.

Devine, A., Fawcett, K., Szűcs, D., & Dowker, A. (2012). Gender differences in mathematics anxiety and the relation to mathematics performance while controlling for test anxiety. Behavioral and brain functions, 8(1), 1.

Murayama, K., Pekrun, R., Lichtenfeld, S., & Vom Hofe, R. (2012). Predicting long‐term growth in students’ mathematics achievement: The unique contributions of motivation and cognitive strategies. Child development, 84(4), 1475-1490.

See Also

Andreescu, T., Gallian, J. A., Kane, J. M., & Mertz, J. E. (2008). Cross-cultural analysis of students with exceptional talent in mathematical problem solving. Notices of the AMS, 55(10), 1248-1260.

Berger, A., Tzur, G., & Posner, M. I. (2006). Infant brains detect arithmetic errors. Proceedings of the National Academy of Sciences, 103(33), 12649-12653.

Post Edits

13/07/2016 – Added references and see also sections.  Updated inline references to show primary sources rather than just linking to secondary sources.

14/07/2016 – Corrected typo in final paragraph “Math ability is not hard wired…” changed to “Math inability is not hard wired”.


Why Learn Machine Learning and Optimisation?

In this post I hope to convince the reader that machine learning and optimisation are worthwhile fields for a software developer in industry to engage with.

I explain the purpose of this blog and argue that we are in the midst of a machine-learning revolution.

______________________________________________________________

When I first started coding as a teenager in the early 1990s, the future looked certain to be shaped by artificial intelligence. We were told that we’d soon have “fifth generation” languages that would allow for the creation of complex software applications without the need for human programmers. Expert systems would replace human experts in every walk of life and we’d talk to our machines in much the same way Gene Roddenberry imagined we should.

file4711279208151

Unfortunately, this model of reality didn’t quite go to plan. After many years of enormous research and development expense — mainly focused in Japan — we entered another AI winter. The future was left in the hands of a handful of diehard academics, while the software industry mostly ignored AI research.

The good news is that the AI winter is now well and truly over.  The technology has been slowly but surely increasing its influence on mainstream software development and data analytics for at least a decade and 2015 has been billed as a breakthrough year by media sources such as Bloomberg and and Wired magazine.

Whether we realise it or not, most of us use AI every day. In fact, AI is responsible for all of the coolest software innovations you’ve heard of in recent years. It is the basis for autonomous helicopters, autonomous cars, big data analytics, google search, automatic language translation, targeted advertising, optical character recognition, speech recognition, facial recognition, anomaly detection, news-article clustering, vehicle routing and product recommendation, just to list the few examples I could name at the time of writing.

As a field, artificial intelligence has been deeply rooted in academia for decades, but it is quickly becoming prevalent in industry.  We are at the dawn of the AI revolution and there has never been a better time to start sciencing up your skill set.

This blog, is here to help and, as its name suggests, will focus on two important and complementary sub-fields of AI: Machine Learning and Optimisation. The intention is to explain both topics in a language that software developers in industry can easily understand, with or without a background in hard computer science.

I believe this is an important addition to the discourse on these topics because most of the sources you’re likely to come across assume a strong existing knowledge of linear algebra, calculus, statistics, probability, information theory and computational complexity theory: the language of academic computer science.  This is unsurprising, given that the techniques were mostly developed by computer scientists, mathematicians and statisticians, but it can unfortunately be a barrier to a lot of people getting started.

The intention here is to remove that barrier by describing the various techniques using familiar, medium-level programming languages.  The posts that follow will not shy away from the theory, but no assumptions will be made with respect to prior understanding of mathematics or computer science and code snipets will accompany any mathematical descriptions.