Saturday, September 01, 2012

Book Review: Introduction to Regular Expressions by Michael Fitzgerald

Introducing Regular ExpressionsIntroducing Regular Expressions by Michael Fitzgerald
My rating: 4 of 5 stars

This is not the first time I've tried to learn Regular Expressions. But other than some basic syntax, it never clicked. But I think this book provided the kind of introduction I needed to get me to not just know the syntax, but to get an understanding of how regular expressions work and how to learn its power. In short, it is not a reference, but a book that teaches me how to learn. And that is what I had not had before.

I've come across regular expressions on numerous occasions in the course of figuring out how to do tasks in various programming languages and tools, but I always found the idea of actually learning it daunting. And when opportunities came up, I generally figured out some Frankenstein combination of tools, functions, and macros that could get the job done.

What Michael Fitzgerald did was not to just give the mechanics of regular expressions, but his exercises lead you through exploring how it works by adding (or subtracting) various expressions so you understand the effects of each feature that is being put to use. The chapters add concepts that demonstrate the power of regular expressions that go well beyond the simple searching that many text or word processing packages use, starting with basic searches and pattern matching and adding markups, boundaries and anchors, back references, groups, alternatives, character classes, unicode, quantifiers, and lookarounds. Each step built up understanding and by the time I got to the last chapter I realized that I actually understood much of the new (to me) concepts that give regular expressions capabilities beyond search and replace that I was used to having in my tools.

Another useful feature of the book is the introduction of numerous tools to help work with regular expressions. He covers a few websites that help test regular expressions as well as features or add-ins for a number of text editors. He also covers how regular expressions are used inside some Unix tools such as grep, sed, and awk.

One thing that is confusing are various dialects of regular expressions. As he goes and switches between websites, Fitzgerald mentions that some of the features don't work in all the websites or tools covered. But he does not explain how to tell the difference. I half remember hearing about various dialects of regular expressions (for example, Fitzgerald mentions that grep does not have full capabilities) but some identification of this may have been helpful, so you can match the various tools you are using with the development environments you work with. So instead of observing that different tools are different, he could have identified some major dialects, and matched the tool to the dialect of regular expression (e.g. grep, perl, java, javascript). Even better, pick a dialect for the book, state up front that the book is based on a single dialect (in keeping with the fact that this is a pure introduction), and identify key areas where dialects may differ as you go.

Overall, highly recommended introduction to regular expressions. In my work in data analysis, I am at the point where I realize that I need to be able to process text automatically as the volume is too much to make doing this manually practical. I already surprise my colleagues with my ability to automate much analysis (they generally use MS Excel). Regular Expressions will enable what is indistinguishable to magic.

Disclaimer: I received a free electronic copy of this book as part of the O'Reilly Blogger program.
Introducction to Regular Expressions by Michael Fitzgerald





View all my reviews I review for the O'Reilly Blogger Review Program

No comments: