Sunday, February 16, 2014

Mining the Social Web, 2nd ed by Matthew Russell: Book review

Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and MoreMining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More by Matthew A. Russell
My rating: 4 of 5 stars

The hardest part of learning a data analysis method is not in implementing the method, it is applying the method in the context of a real data problem. And data mining and machine learning texts often skirt the issue by using pre-processed data sets and problems defined to fit the method being taught. Russell uses analysis of social media sites to set a context where you start from having to gain access to real data sets, clean and transform the data into forms that your analytical libraries can make sense of, and then use the results to make a conclusion. For that, it rates a place along any other text that focuses more on the analytical methodology itself.

What I most appreciated about this book was the work put into converting data from one format to another. From the beginning, when he works with data pulled using a services API, then getting that into a format that another library requires, then getting those results into a data mining framework for analysis. Following his flow has helped me understand the methods better. And these examples of processing data from format to format is something that gets my students stuck before they get really started in a project. I especially appreciated the chapters that worked with the Natural Language Toolkit (NLTK) and the NetworkX graph libraries. These examples helped me get pass what was the hard part for me in working with these libraries in previous encounters.

The virtual machine is also very helpful. I have always found the hardest part of working with Python for analytic computing has been teaching my collaborators how to get set up. And in data mining this is even harder than standard. I was able to get through his book installing everything on one machine, but on another I used the author's virtual machine, and I have pointed a student who was working with me to the virtual machine as well.

This is a great book to work through the mess of implementing data mining methods in real situations. It is not a theory book, but it serves its purpose well.

Note: I received a free electronic copy of this book from the O'Reilly Press Blogger program.
I review for the O'Reilly Reader Review Program
View all my reviews

No comments: