Open thread: How do you start working on a data graphic?

Matthew — November 4, 2010 at 10:16 am

I guess I scan through as much of the set as I can quickly, looking for trends, and sometimes I’ll try a simple graph or something to confirm my suspicions. Sometimes, this is done in conjunction with converting the dataset to whatever format I need it in, usually JSON.

Mark — November 4, 2010 at 10:50 am

library(ggplot2)

dan — November 4, 2010 at 11:14 am

i typically work in either stata, matlab, or excel.
depending on the type of data, i will usually begin with time-trend graphs, histograms, or scatter / joint-density plots

Andrew — November 4, 2010 at 11:55 am

For me, I’ll start in Stata with some tabulations and density plots to get a sense of what’s in the dataset. A correlation matrix is also a nice tool to quickly see where relationships in the data might be. After these initial steps it’s on the the more *interesting* work.

Matt — November 4, 2010 at 12:27 pm

Generally I first find out what the question is that the data is there to answer and then figure out how I can use that data to answer the question in the simplest and most concise way possible. After that it’s just a case of using the tools I have at my disposal (usually Excel) to provide the answer!

Julien — November 4, 2010 at 12:57 pm

I fully agree with Matt:
1/ who is the user / requester?
2/ what is the question?
3/ where is the data coming from? when was it extracted? is it complete?
Only then do I consider the tool and most probably I’ll try easy graphs first to find trends and outliers …

Eugene — November 4, 2010 at 1:05 pm

I currently have an assignment as visualization designer/developer at a statistics agency. People from different statistical departments approach us to visualize their data. Sometimes these internal clients have a good idea of what they want to visualize, but often you have to educate people that a good visualization is more than a literal translation of tabular data. Our mantra is that you have to choose the story and focus during the whole development process on just this story. Statistical specialists have a lot of in depth knowledge about their data and want to show all interesting details. But trying to satisfy all these demands often results in complex and confusing visualizations with way too many interaction possibilities.

So the first thing we do is: Ask the customer ‘What is the story that the visualization has to tell?’ (e.g. ‘educated people live longer’)
The second, third, fourth and so on thing we do is: Say to the customer that all these other fancy things he/she keeps on proposing are not in line with the original story that we are trying to visualize.

We are still learning on how to create and design visualizations. Telling one story very clearly is already difficult enough, let alone two or more.

Adam — November 4, 2010 at 1:50 pm

I do a lot of work with qualitative social media data, so it usually involves a lot of reading and identifying trends and then translating those numbers into a typographic treatment. It sounds cheesy, but I always start knowing that there’s a story within the data that can be teased out and it’s usually contrary to anything I could anticipate or completely surprising.

Practically speaking, if I notice temporal trending (i.e. people talk about “Pauly Shore” whenever Jersey Shore comes on) I like to turn to my good friend plain ol’ line graph.

amathew — November 4, 2010 at 2:21 pm

I import the data into R and examine the data.

Then, library(ggplot2)

SAS Love — November 4, 2010 at 2:28 pm

SAS. SAS. SAS.

Nabser — November 4, 2010 at 2:43 pm

Making sense out of the data is the first step. If it’s my data bit of clean up, some ordering and should be ready to analyze. If it’s somebody’s else, understanding it is a priority I would say. Then Excel or Matlab is my friend depending on the complexity of question/problem and how fast we want to get the answer/solution.

alperyilmaz — November 4, 2010 at 9:47 pm

I start with Linux command-line-fu.. It’s amazing how much the terminal can do for you..

Mark — November 5, 2010 at 2:14 am

Interesting. What command line tools do you use to get started with visualization?
- alperyilmaz — November 6, 2010 at 12:38 am
  
  I start with usual linux commands first (cut, join, sort, paste,awk).. If a complicated task is needed, then I use perl one-liner..

Sebastian Hamm — November 5, 2010 at 2:39 am

I convert the dataset to Stata, look at some summary statistics, frequency tables and data marices for subgroups and then create exploratory scatter plots or bar charts (depending whether the variables are metric or categorical).
I usually go on using Stata for almost all of my graphics and in the end export them to pdf or tikz for LaTeX.

Joe Mako — November 5, 2010 at 5:58 am

First check data quality and profile the data. After trust and structure logic is established, it is a safari.

marco — November 5, 2010 at 6:47 am

Who’s the user is definitely the first question, this regarding the design
At the same time, I analyze data and mess it up and analyze again till something interesting comes out, usually a consistent scheme or a correspondance among some data set and some other

Nathan — November 5, 2010 at 7:37 am

I ask what the question is and help define it first.

Al — November 6, 2010 at 7:22 am

After cleaning and formatting the data, I plug it into Tableau and usually start looking for bivariate relationships (and outliers) through scatterplots. It depends on the data set, though, and what I am planning on using it for (writing a column, making statistical projections, etc.).

Elizabeth T — November 8, 2010 at 12:19 pm

“What is the story” is what I encountered the first time I sought feedback on a poster with my master’s research from someone other than my advisor. He said “explain this to me”, and I realized I couldn’t do it with the pictures I had put in front of him.

If the data don’t explicitly support the story – don’t try to make them.

Dan Murray — December 14, 2010 at 5:58 pm

Circle, riff and dive. I circle the broad view of the dataset, then riff through the more interesting outliers and trends and finally dive more deeply into the interesting items that surface.

Dr Smith — January 26, 2011 at 7:39 pm

write an equasion to convert the data into a visually presentable microbob (bobs/sprites) including imbedded into each microbob data pointers for the user (point click or whatever) for indepth presentation of that microbobs information (data) either in an expander microbob presentation or data stream or both its really upto the programmer & the programming architect.

Open thread: How do you start working on a data graphic?

Topic

FlowingData Delivered to Your Inbox

22 Comments

Second Edition

Visualize This: The FlowingData Guide to Design, Visualization, and Statistics (2nd Edition)

Open thread: How do you start working on a data graphic?

Topic

FlowingData Delivered to Your Inbox

Related

22 Comments

Second Edition

Visualize This: The FlowingData Guide to Design, Visualization, and Statistics (2nd Edition)