What I Use to Visualize Data

Posted to Guides  |  Tags:  |  Nathan Yau

“What tool should I learn? What’s the best?” I hesitate to answer, because I use what works best for me, which isn’t necessarily the best for someone else or the “best” overall.

If you’re familiar with a software set already, it might be better to work off of what you know, because if you can draw shapes based on numbers, you can visualize data. After all, this guy uses Excel to paint scenery.

It’s much more important to just get started already. Work with as much data as you can.

Nevertheless, this is the set of tools I use in 2016, which converged to a handful of things over the years. It looks different from 2009, and will probably look different in 2020. I break it down by place in my workflow.

Processing and Formatting Data

Assuming I have the data I want (big assumption), this is a stage of tedium. Solutions typically reflect my state of I-want-this-to-be-done-already, and I use whatever tool is closest. I would use a hammer if I could.

Python

When I have a non-rectangular delimited file, or the dataset is relatively messy, I write ad hoc Python scripts. If I’m lucky, I have an old script somewhere that I can repurpose. Sometimes I use the Beautiful Soup library for markup. I should probably look at csvkit.

R

I only use R at this stage once I have a workable CSV file to load. So I’m only doing things like aggregation, merging, or data derived from the original.

Tabula

I tend towards public government data, and that involves data in PDF files. It’s a pain. Tabula makes it less painful.

Microsoft Excel

This is out of necessity. Data comes in Excel files, and importing into other programs like Numbers or OpenOffice just don’t cut it.

Google Sheets

Sometimes working with spreadsheets is quicker than scripting, and I appreciate the simplicity.

Analysis

You have to get to know a dataset before you work on a final graphic.

R

My thinking language is R. It’s an open source statistical computing language with a big community, a ton of packages, and a lot of answered questions on Stack Overflow.

Static Graphics

This is typically a two-stage process for me: (1) Visualize in R; and (2) touch up in Illustrator.

R

There are visualization packages in R, namely ggplot2, but I almost exclusively use R out of the box, also known as base R. I’ve written a lot of tutorials about the process.

Adobe Illustrator

If the graphic is for public consumption, I save the R graphics as PDF files and edit in Illustrator. It’s overkill for what I’m doing, but it works. I’m thinking about giving Sketch a try.

Interactive Graphics

Flash is out. JavaScript is in. R not so much.

d3.js

I use Data-Driven Documents these days for the interactive work (and I’m still learning). There are many examples to start from. Although if I’m after a quick chart, I might try Vega-Lite next time.

That’s it. I use other things, but this covers about 99% of what I do. The list of things is code-centric, which is more out of necessity than a love of programming. If something comes along that let’s me make what I want in a less amount of time or effort I’ll use that. But for now, this is what I have, and I think it’ll be this set for a while.

Favorites

The Best Data Visualization Projects of 2011

I almost didn’t make a best-of list this year, but as I clicked through the year’s post, it was hard …

Watching the growth of Walmart – now with 100% more Sam’s Club

The ever so popular Walmart growth map gets an update, and yes, it still looks like a wildfire. Sam’s Club follows soon after, although not nearly as vigorously.

Reviving the Statistical Atlas of the United States with New Data

Due to budget cuts, there is no plan for an updated atlas. So I recreated the original 1870 Atlas using today’s publicly available data.

Real Chart Rules to Follow

There are rules—usually for specific chart types meant to be read in a specific way—that you shouldn’t break. When they are, everyone loses. This is that small handful.