Saturday, March 24, 2012

Book Review: Machine Learning for Hackers by Conway and White

Machine Learning for HackersMachine Learning for Hackers by Drew Conway
My rating: 5 of 5 stars

Machine Learning for Hackers is not a reference book or a standard programming tutorial on machine learning. For references, you go to Hastie, Tibshirani,and Friedman's 'The Elements of Statistical Learning.' For tutorials, there are a fair number of sources that could walk you through the use of regression, data exploration, classifiers, principle component analysis, etc functions in R. But what MLfH gives you are Drew Conway and John Myles White. And they don't teach skills, they are passing on wisdom of how to work with data, how data needs to be explored, understood, manipulated, and finally, using machine learning methods to gain understanding.

In computer modeling in general and data analysis in particular, one thing that is often hard to convey is that the purpose of computing is not numbers, but insight. The effects of this problem is seen in graduate from even the best schools knowing how to drive a computer program, but not knowing how to interpret results or how to ask a question, then taking the results from that and asking the next question. The course we teach and the texts that we use do not help. Our courses are each siloed to present a distinct portion of the total body of knowledge. Textbooks are often either theoretical or intended to provide a glimpse of application, but always in bounded chunks. Computer application books are often built around the capabilities of program in question, but often stop at the edge of the capabilities of the application or environment in question. What is needed is not to tech methods or tools, but to teach wisdom. The ideal is to be able to sit side by side with an expert who can walk through a data set and ask questions, get answers, and to think about what to do next, whether the answer is what was expected or not.

This is what Conway and White do. For each topic, they open up with a discussion of the problem type and the tools, and sometimes with a toy example. But then they go through a substantive example. And the narrative text is where they shine. They take a messy dataset (often the publicly available/accessible form) and work through what needs to be massaged to get it into useable form. Next is processing the data into the R data type needed for the analysis. Then initial exploratory steps where you gain understanding of problem, and how to analyze it, finally analysis and presentation.

I've been taught that in learning a programming language, it is often beneficial to have two books for reference (other than tutorials), one that is a proper reference (i.e. how to do something), and one a morality reference, how you should approach doing something. In data analysis, you should know the theory/methodology and how to use the tools at hand to apply the methodology, but also how to think about problems. And short of an apprenticeship with a master, MLfH does very well in this.

Disclaimer: I received a free electronic copy of this book from the OReilly Blogger Program. More information on this book can be found at the book web site.
I review for the O'Reilly Blogger Review Program

No comments: