Friday, November 09, 2012

Book Review: Python for Data Analysis by Wes McKinney

Python for Data AnalysisPython for Data Analysis by Wes McKinney
My rating: 5 of 5 stars

For some time now I have been using R and Python for data analysis. And I have long ago discovered the Python technical stack of ipython, NumPy, Scipy, and Matplotlib and I thought I knew what I was doing. I even dipped my toe into pandas as my data structure for analysis. But Python for Data Analysis showed me entire worlds of improvement in my workflow and my ability to work with data in the messy form that is found in the real world.

Python, like most interpreted languages, is slow compared to compiled languages. But there is a technical stack that started with the NumPy libraries and has grown to include Scipy, Matplotlib (graphing), ipython (shell) and pandas you get high quality and fast algorithm and data structure Fortran and C libraries underneath Python. But while these libraries are designed to be used together, documentation tends to be only about one at a time, and very little puts it all together as an integrated whole. McKinney's Python for Data Analysis fills that gap.

Even though I have been using iPython, NumPy, Scipy and Matplotlib for years, and pandas for about half a year, going through this book makes me feel like I was a rank novice. I learned how to efficiently use the shell as a development tool, to the point I have stopped automatically using the ipython notebook or pydev (eclipse) when starting new projects and I use the shell instead, because its introspection and debugging capabilities made it much easier to work. I had started using pandas for a data structure because I liked the similarities with R data frames, this book showed me where pandas goes well beyond that. With matplotlib I could make specific plots, this book showed me how to use the pandas interface to make them a natural part of the workflow (even if it is not yet at the level of a grammer such as ggplots)

Python for Data Analysis does not just teach how to use the Python scientific stack, it also teaches a workflow for technical computing. And this is beyond what you can get from reading off the web, it probably really requires the opportunity to work alongside someone who knows what they are doing to see the practices that makes them productive. As such, I would recommend it for anyone who does scientific and technical computing, whether in the sciences, engineering, finance, or other areas where quantitative computing using Python is done.

Disclaimer: I received a free electronic copy of this book from the O'Reilly Blogger Program.

View all my reviews

I review for the O'Reilly Blogger Review Program

No comments: