What I Use to Visualize Data

Posted to Guides  |  Tags:  |  Nathan Yau

“What tool should I learn? What’s the best?” I hesitate to answer, because I use what works best for me, which isn’t necessarily the best for someone else or the “best” overall.

If you’re familiar with a software set already, it might be better to work off of what you know, because if you can draw shapes based on numbers, you can visualize data. After all, this guy uses Excel to paint scenery.

It’s much more important to just get started already. Work with as much data as you can.

Nevertheless, this is the set of tools I use in 2016, which converged to a handful of things over the years. It looks different from 2009, and will probably look different in 2020. I break it down by place in my workflow.

Processing and Formatting Data

Assuming I have the data I want (big assumption), this is a stage of tedium. Solutions typically reflect my state of I-want-this-to-be-done-already, and I use whatever tool is closest. I would use a hammer if I could.


When I have a non-rectangular delimited file, or the dataset is relatively messy, I write ad hoc Python scripts. If I’m lucky, I have an old script somewhere that I can repurpose. Sometimes I use the Beautiful Soup library for markup. I should probably look at csvkit.


I only use R at this stage once I have a workable CSV file to load. So I’m only doing things like aggregation, merging, or data derived from the original.


I tend towards public government data, and that involves data in PDF files. It’s a pain. Tabula makes it less painful.

Microsoft Excel

This is out of necessity. Data comes in Excel files, and importing into other programs like Numbers or OpenOffice just don’t cut it.

Google Sheets

Sometimes working with spreadsheets is quicker than scripting, and I appreciate the simplicity.


You have to get to know a dataset before you work on a final graphic.


My thinking language is R. It’s an open source statistical computing language with a big community, a ton of packages, and a lot of answered questions on Stack Overflow.

Static Graphics

This is typically a two-stage process for me: (1) Visualize in R; and (2) touch up in Illustrator.


There are visualization packages in R, namely ggplot2, but I almost exclusively use R out of the box, also known as base R. I’ve written a lot of tutorials about the process.

Adobe Illustrator

If the graphic is for public consumption, I save the R graphics as PDF files and edit in Illustrator. It’s overkill for what I’m doing, but it works. I’m thinking about giving Sketch a try.

Interactive Graphics

Flash is out. JavaScript is in. R not so much.


I use Data-Driven Documents these days for the interactive work (and I’m still learning). There are many examples to start from. Although if I’m after a quick chart, I might try Vega-Lite next time.

That’s it. I use other things, but this covers about 99% of what I do. The list of things is code-centric, which is more out of necessity than a love of programming. If something comes along that let’s me make what I want in a less amount of time or effort I’ll use that. But for now, this is what I have, and I think it’ll be this set for a while.


Most popular porn searches, by state

We’ve seen that we can learn from what people search for, through the eyes of Google suggestions: state stereotypes, national …

Top Brewery Road Trip, Routed Algorithmically

There are a lot of great craft breweries in the United States, but there is only so much time. This is the computed best way to get to the top rated breweries and how to maximize the beer tasting experience. Every journey begins with a single sip.

How You Will Die

So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.

Shifting Incomes for American Jobs

For various occupations, the difference between the person who makes the most and the one who makes the least can be significant.