This is the third of a set of reading lists written from the point of view of a software engineer who wants to develop their knowledge of machine learning. In Part 1, we looked at some introductory books to the discipline and in Part 2 we looked at programming books. In this post, we'll look at some of the more advanced text books.
Machine Learning: The Art and Science of Algorithms that Make Sense of Data by Peter Flach. There's more mathematical emphasis in this one and it leaves out neural networks entirely, perhaps an effect of being written just before that field re-emerged in 2012.
Machine Learning by Tom Mitchell. It’s hard to recommend this outright, as much as I’d like to, because it can be expensive to buy these days. It's twenty years old and dated in parts although it does have good sections on neural networks and reinforcement learning (they were big in the Nineties!). Shop around and consider getting a second hand copy if you like the look of it.
Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. This has become the default text book for deep learning. There isn't much book-wise to recommend at an introductory level on deep learning that isn't specific to a framework, so most roads will lead to this book. If you're comfortable with the mathematics which you can preview in the opening chapter, go ahead and dive in. Even without the math background, I'd still recommend it.
Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto. Twenty years on, Sutton & Barto is still the definitive text on Reinforcement Learning. If you asked me around that time (also around the time I got my AI degree), what the most promising area in machine learning was, I probably would have said Reinforcement Learning. That hasn't quite worked out; there's a lot of interest in RL, but it's proven difficult to apply, notable exceptions like AlphaGo aside. The book is a nice step beyond Bandit Algorithms for Website Optimization mentioned in Part 1 and good background reading if you're playing around with something like Open AI's Gym or are working on optimisation problems like A/B testing. A second edition is in the works.
The Elements of Statistical Learning (ESL) by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The is the standard advanced text book on statistical learning. It's the big brother to An Introduction to Statistical Learning from Part 1. The book is comprehensive and covers a lot of ground. The books unifies ML approaches around recurring challenges like bias and dimensionality. The math I reckon, is graduate level and very hard going (for me at least) unless you have a solid background. Even so, it's extremely well written, a fine book to dip into, lose time on a topic, and have your brain melted. Ideal as a reference book and for research.
- Machine Learning: A Probabilistic Perspective (MLAPP) by Kevin Murphy. This is highly regarded and comprehensive overview of machine learning. Like Peter Flach's book, the content is a reflection of being published in 2012 and misses out in deep learning, and reinforcement learning to focus on other topics. Very heavy on math and catalogs techniques rather than explain them. I'd argue the Flach, Norvig, or Mitchell books are more approachable for engineers. Definitely best as a reference text for researchers and analytics professionals.
- Pattern Recognition and Machine Learning (PRML) By Chris Bishop. This is a classic text, one you're "supposed" to read. In some respects Bishop's book is to machine learning as Knuth's is to algorithms, and you'll come across it again and again. Even so, I'm reluctant to recommend it for engineering purposes, as it's effectively a maths textbook and there is some suggestion that Kevin Murphy's book is more current and comprehensive.
From an engineering standpoint it's difficult to say any of the books here are must reads, as they're optimised for professional scientists and researchers rather than software engineers. The exceptions are probably Deep Learning and Reinforcement Learning - these are important areas, where comprehensive introductory level texts either don't exist, or are focused on frameworks.
The value these books bring is when you decide to really dig into a particular area. Also being aware of the sheer amount of stuff data scientists and machine learning researchers are expected to know - this is on top of a forever war with data sets and distributed systems. Understanding of the sheer "stack depth" colleagues have to deal with is hugely motivating to me as a builder and makes it clear we can't have enough good tools and systems to support machine learning work.
In Part 4, we'll switch from books to online papers and blog posts.