People approach data in different ways, especially across different fields. When you’re presented with a dataset that you have to convert to a graphic, what’s the first thing that you do?
I guess I scan through as much of the set as I can quickly, looking for trends, and sometimes I’ll try a simple graph or something to confirm my suspicions. Sometimes, this is done in conjunction with converting the dataset to whatever format I need it in, usually JSON.
i typically work in either stata, matlab, or excel.
depending on the type of data, i will usually begin with time-trend graphs, histograms, or scatter / joint-density plots
For me, I’ll start in Stata with some tabulations and density plots to get a sense of what’s in the dataset. A correlation matrix is also a nice tool to quickly see where relationships in the data might be. After these initial steps it’s on the the more *interesting* work.
Generally I first find out what the question is that the data is there to answer and then figure out how I can use that data to answer the question in the simplest and most concise way possible. After that it’s just a case of using the tools I have at my disposal (usually Excel) to provide the answer!
I fully agree with Matt:
1/ who is the user / requester?
2/ what is the question?
3/ where is the data coming from? when was it extracted? is it complete?
Only then do I consider the tool and most probably I’ll try easy graphs first to find trends and outliers …
I currently have an assignment as visualization designer/developer at a statistics agency. People from different statistical departments approach us to visualize their data. Sometimes these internal clients have a good idea of what they want to visualize, but often you have to educate people that a good visualization is more than a literal translation of tabular data. Our mantra is that you have to choose the story and focus during the whole development process on just this story. Statistical specialists have a lot of in depth knowledge about their data and want to show all interesting details. But trying to satisfy all these demands often results in complex and confusing visualizations with way too many interaction possibilities.
So the first thing we do is: Ask the customer ‘What is the story that the visualization has to tell?’ (e.g. ‘educated people live longer’)
The second, third, fourth and so on thing we do is: Say to the customer that all these other fancy things he/she keeps on proposing are not in line with the original story that we are trying to visualize.
We are still learning on how to create and design visualizations. Telling one story very clearly is already difficult enough, let alone two or more.
I do a lot of work with qualitative social media data, so it usually involves a lot of reading and identifying trends and then translating those numbers into a typographic treatment. It sounds cheesy, but I always start knowing that there’s a story within the data that can be teased out and it’s usually contrary to anything I could anticipate or completely surprising.
Practically speaking, if I notice temporal trending (i.e. people talk about “Pauly Shore” whenever Jersey Shore comes on) I like to turn to my good friend plain ol’ line graph.
I import the data into R and examine the data.
SAS. SAS. SAS.
Making sense out of the data is the first step. If it’s my data bit of clean up, some ordering and should be ready to analyze. If it’s somebody’s else, understanding it is a priority I would say. Then Excel or Matlab is my friend depending on the complexity of question/problem and how fast we want to get the answer/solution.
I start with Linux command-line-fu.. It’s amazing how much the terminal can do for you..
Interesting. What command line tools do you use to get started with visualization?
I start with usual linux commands first (cut, join, sort, paste,awk).. If a complicated task is needed, then I use perl one-liner..
I convert the dataset to Stata, look at some summary statistics, frequency tables and data marices for subgroups and then create exploratory scatter plots or bar charts (depending whether the variables are metric or categorical).
I usually go on using Stata for almost all of my graphics and in the end export them to pdf or tikz for LaTeX.
First check data quality and profile the data. After trust and structure logic is established, it is a safari.
Who’s the user is definitely the first question, this regarding the design
At the same time, I analyze data and mess it up and analyze again till something interesting comes out, usually a consistent scheme or a correspondance among some data set and some other
I ask what the question is and help define it first.
After cleaning and formatting the data, I plug it into Tableau and usually start looking for bivariate relationships (and outliers) through scatterplots. It depends on the data set, though, and what I am planning on using it for (writing a column, making statistical projections, etc.).
“What is the story” is what I encountered the first time I sought feedback on a poster with my master’s research from someone other than my advisor. He said “explain this to me”, and I realized I couldn’t do it with the pictures I had put in front of him.
If the data don’t explicitly support the story – don’t try to make them.
Circle, riff and dive. I circle the broad view of the dataset, then riff through the more interesting outliers and trends and finally dive more deeply into the interesting items that surface.
write an equasion to convert the data into a visually presentable microbob (bobs/sprites) including imbedded into each microbob data pointers for the user (point click or whatever) for indepth presentation of that microbobs information (data) either in an expander microbob presentation or data stream or both its really upto the programmer & the programming architect.