What I Use to Visualize Data

Posted to Guides  |  Tags:  |  Nathan Yau

“What tool should I learn? What’s the best?” I hesitate to answer, because I use what works best for me, which isn’t necessarily the best for someone else or the “best” overall.

If you’re familiar with a software set already, it might be better to work off of what you know, because if you can draw shapes based on numbers, you can visualize data. After all, this guy uses Excel to paint scenery.

It’s much more important to just get started already. Work with as much data as you can.

Nevertheless, this is the set of tools I use in 2016, which converged to a handful of things over the years. It looks different from 2009, and will probably look different in 2020. I break it down by place in my workflow.

Processing and Formatting Data

Assuming I have the data I want (big assumption), this is a stage of tedium. Solutions typically reflect my state of I-want-this-to-be-done-already, and I use whatever tool is closest. I would use a hammer if I could.

Python

When I have a non-rectangular delimited file, or the dataset is relatively messy, I write ad hoc Python scripts. If I’m lucky, I have an old script somewhere that I can repurpose. Sometimes I use the Beautiful Soup library for markup. I should probably look at csvkit.

R

I only use R at this stage once I have a workable CSV file to load. So I’m only doing things like aggregation, merging, or data derived from the original.

Tabula

I tend towards public government data, and that involves data in PDF files. It’s a pain. Tabula makes it less painful.

Microsoft Excel

This is out of necessity. Data comes in Excel files, and importing into other programs like Numbers or OpenOffice just don’t cut it.

Google Sheets

Sometimes working with spreadsheets is quicker than scripting, and I appreciate the simplicity.

Analysis

You have to get to know a dataset before you work on a final graphic.

R

My thinking language is R. It’s an open source statistical computing language with a big community, a ton of packages, and a lot of answered questions on Stack Overflow.

Static Graphics

This is typically a two-stage process for me: (1) Visualize in R; and (2) touch up in Illustrator.

R

There are visualization packages in R, namely ggplot2, but I almost exclusively use R out of the box, also known as base R. I’ve written a lot of tutorials about the process.

Adobe Illustrator

If the graphic is for public consumption, I save the R graphics as PDF files and edit in Illustrator. It’s overkill for what I’m doing, but it works. I’m thinking about giving Sketch a try.

Interactive Graphics

Flash is out. JavaScript is in. R not so much.

d3.js

I use Data-Driven Documents these days for the interactive work (and I’m still learning). There are many examples to start from. Although if I’m after a quick chart, I might try Vega-Lite next time.

That’s it. I use other things, but this covers about 99% of what I do. The list of things is code-centric, which is more out of necessity than a love of programming. If something comes along that let’s me make what I want in a less amount of time or effort I’ll use that. But for now, this is what I have, and I think it’ll be this set for a while.

Favorites

Most popular porn searches, by state

We’ve seen that we can learn from what people search for, through the eyes of Google suggestions: state stereotypes, national …

Where People Run in Major Cities

There are many exercise apps that allow you to keep track of your running, riding, and other activities. Record speed, …

Jobs Charted by State and Salary

Jobs and pay can vary a lot depending on where you live, based on 2013 data from the Bureau of Labor Statistics. Here’s an interactive to look.

Who is Older and Younger than You

Here’s a chart to show you how long you have until you start to feel your age.