Introduction to regular expressions

Posted to Coding  |  Tags:  |  Nathan Yau

If you want to analyze bodies of text, it’s a good to know how to use regular expressions. That way you can programmatically extract complex text patterns instead of marking and encoding items manually. Thomas Nield for O’Reilly provides an introduction:

Many data science, analyst, and technology professionals have encountered regular expressions at some point. This esoteric, miniature language is used for matching complex text patterns, and looks mysterious and intimidating at first. However, regular expressions (also called “regex”) are a powerful tool that only require a small time investment to learn. They are almost ubiquitously supported wherever there is data.

Nield says it isn’t a steep learning curve, which I agree with, but I would suggest not trying to learn every part of the syntax. Learn it piecewise, and it’ll seem like less of a jumble of brackets, periods, and question marks.

See also the RegExr. It’s an interactive tool that lets you paste a body of text and then enter regular expressions to see what matches your given pattern in real-time.

Favorites

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.

How to Spot Visualization Lies

Many charts don’t tell the truth. This is a simple guide to spotting them.

Interactive: When Do Americans Leave For Work?

We don’t all start our work days at the same time, despite what morning rush hour might have you think.

Shifting Incomes for American Jobs

For various occupations, the difference between the person who makes the most and the one who makes the least can be significant.