What I Use to Visualize Data

Posted to Guides  |  Tags:  |  Nathan Yau

“What tool should I learn? What’s the best?” I hesitate to answer, because I use what works best for me, which isn’t necessarily the best for someone else or the “best” overall.

If you’re familiar with a software set already, it might be better to work off of what you know, because if you can draw shapes based on numbers, you can visualize data. After all, this guy uses Excel to paint scenery.

It’s much more important to just get started already. Work with as much data as you can.

Nevertheless, this is the set of tools I use in 2016, which converged to a handful of things over the years. It looks different from 2009, and will probably look different in 2020. I break it down by place in my workflow.

Processing and Formatting Data

Assuming I have the data I want (big assumption), this is a stage of tedium. Solutions typically reflect my state of I-want-this-to-be-done-already, and I use whatever tool is closest. I would use a hammer if I could.

Python

When I have a non-rectangular delimited file, or the dataset is relatively messy, I write ad hoc Python scripts. If I’m lucky, I have an old script somewhere that I can repurpose. Sometimes I use the Beautiful Soup library for markup. I should probably look at csvkit.

R

I only use R at this stage once I have a workable CSV file to load. So I’m only doing things like aggregation, merging, or data derived from the original.

Tabula

I tend towards public government data, and that involves data in PDF files. It’s a pain. Tabula makes it less painful.

Microsoft Excel

This is out of necessity. Data comes in Excel files, and importing into other programs like Numbers or OpenOffice just don’t cut it.

Google Sheets

Sometimes working with spreadsheets is quicker than scripting, and I appreciate the simplicity.

Analysis

You have to get to know a dataset before you work on a final graphic.

R

My thinking language is R. It’s an open source statistical computing language with a big community, a ton of packages, and a lot of answered questions on Stack Overflow.

Static Graphics

This is typically a two-stage process for me: (1) Visualize in R; and (2) touch up in Illustrator.

R

There are visualization packages in R, namely ggplot2, but I almost exclusively use R out of the box, also known as base R. I’ve written a lot of tutorials about the process.

Adobe Illustrator

If the graphic is for public consumption, I save the R graphics as PDF files and edit in Illustrator. It’s overkill for what I’m doing, but it works. I’m thinking about giving Sketch a try.

Interactive Graphics

Flash is out. JavaScript is in. R not so much.

d3.js

I use Data-Driven Documents these days for the interactive work (and I’m still learning). There are many examples to start from. Although if I’m after a quick chart, I might try Vega-Lite next time.

That’s it. I use other things, but this covers about 99% of what I do. The list of things is code-centric, which is more out of necessity than a love of programming. If something comes along that let’s me make what I want in a less amount of time or effort I’ll use that. But for now, this is what I have, and I think it’ll be this set for a while.

Favorites

How We Spend Our Money, a Breakdown

We know spending changes when you have more money. Here’s by how much.

10 Best Data Visualization Projects of 2015

These are my picks for the best of 2015. As usual, they could easily appear in a different order on a different day, and there are projects not on the list that were also excellent.

Who is Older and Younger than You

Here’s a chart to show you how long you have until you start to feel your age.

How You Will Die

So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.