Are you looking to get into data visualization, but don’t quite know where to begin?
With all of the available tools to help you visualize data, it can be confusing where to start. The good news is, well, that there are a lot of (free) available tools out there to help you get started. It’s just a matter of deciding which one suits you best. This is a guide to help you figure that out.
But before we get into what you should use, a couple of questions.
What data are you looking at?
Hopefully you already have a dataset that you’re interested in. If not, go find one. It’s important to have actual data when you’re learning, because the visualization tool that you use will depend on it.
There are lots of places on the Web to find data. Here are a few worth checking out:
The above is a very small subset of what’s available. Oh, and let’s not forget all the government organizations that have departments dedicated to putting together datasets. Just pick one you’re interested in.
Got your data? Ok, good, on to the next step.
What’s the purpose of your visualization?
The next step is to figure out you’re trying to do with your visualization. Are you working on a Web application that has some graphs? Is it an interactive tool? Do you want to use better-looking graphs in your slide presentation? Is the visualization for a publication? Do you just need it for analysis?
Again, what you decide here will affect what tool you should use.
What Visualization Software to Use
Now that you have the answers to those two questions in mind, we can make a decision on what will work best for you.
For Publication
This means graphics like what you see in the newspaper. Most people use Adobe Illustrator. It gives you control over all the elements in your graphic – color, stroke, font, orientation, etc.
If you want to do something more complicated than your traditional graphs, you can design it by hand in Illustrator or your can do it in R (either programmatically or with one of the add-on libraries), which is a software environment for statistical computing and graphics. From R, you can import your file as a PDF into Illustrator. That’s usually what I do.
Illustrator is kind of pricey however. Some have suggested using the open-source alternative Inkscape. I’ve never tried it though.
Example: The New York Times
For Presentations
Many want to add some spice to their presentations. You can use the same software as the above, but there’s also not much harm in using Microsoft Excel despite the stigma. The key here is not to use the default settings. You can actually do a lot in Microsoft Excel and make it look good. Plus, you don’t need to include many details in a graphic made for presentation slides, because people can’t see them from far away.
Personally, I don’t use it much for graphics since I’m comfortable with R and Illustrator.
For Analysis
There are a lot of analysis tools, and the preferred one will change depend on who you ask. I use R, which requires some programming skills. Most people use Excel. I’ve also heard a lot of good things about Tableau Software.
For Web Applications
I’m going to assume you have a programming background if you’re looking to do visualization for a Web application. If you don’t know anything about computer code, you can try Many Eyes or Fusion Charts. You’ll be limited to their offerings though.
Now, if you’re developing for the Web, there are two main options here. The first is Processing, which was designed to make coding easier and to give you more bang for the buck. Check out the site and Processing forums for plenty of tutorials and tips. The end result is a Java applet.
The second, more popular option is Flash. You can either do stuff in the actual Flash program, or you can use Actionscript for a pure coding solution. Either way, the end result is something that runs in the Flash environment. The Flare visualization toolkit is a good place to start.
The upside of Flash is that it tends to load faster than Java, and more people have Flash than Java installed on their computer. You might also be able to get away with just a little bit of code if you use just Flash, although, if you really want to get serious with visualization, you’ll need to learn Actionscript.
To that end, Processing is a lot easier to learn coding-wise. Plus it’s free and open source.
Examples: Many Eyes, Rescue Time
For Art
Processing definitely seems to be the software of choice for artists and designers. Again, it goes back to how easy it is to learn and how much you can do with it. Illustrator is the most common choice for non-interactive graphics since it gives you drag-and-drop control over all the elements.
Example: Processing Gallery
What Software Do You Use?
This is obviously a small subset of what’s available. Ultimately, visualization is not just about using one piece of software, but having a full toolbox at your disposal.
Here’s a list of all the programs, tools, and resources I frequently use. What do you use?
my doings lean a little in the art direction, but I’m originally from the infovis side.
I use python for static graphics, and data ‘analysis’ / ‘cleaning’ , then Processing for realtime visualisation.
the numerical python module – aka numpy – is great for data analysis and getting one’s head quickly around the data.
Processing then has a nice speed for showing things.
I was disappointed you didn’t mentioned the JavaScript InfoVis Toolkit as a web visualization tool (http://thejit.org).
Oh wait, I got a full post a while ago! https://flowingdata.com/2009/06/05/javascript-infovis-toolkit-new-version-released/ :D
I’d say that if you’re looking for a nice integration with web standards the JavaScript InfoVis Toolkit is a good option. For larger datasets Flare seams more suitable for this task. Also, Flare uses Flash which means that your visualization will work in 95% of browsers and machines (which is a good thing).
I also use a lot of Python and Beautiful Soup for manipulating data.
Hi Nathan,
i’m use the Import-Functions from Google Docs (Spreadsheets) for Data-Scrapping.
Then, i’m embed Diagrams, Google Maps, Tables etc. in my Blog.
I love Online-Tools ;-).
r and it’s various add on packages for more and more analysis and visualisation – rCurl comes in very handy for scraping data, along with ggplot, plyr and reshape to name a few.
Also looking at workflows involving Excel -> R -> Powerpoint to speed up some chart generation.
I also use Inkscape a little to tidy up R plots and add additional annotation
I heart processing, especially offline, because java applets are annoyingly long to start. processing.js is getting much better though.
at work most of the 10k charts we publish each year are done in Excel, then imported in illustrator. Some are directly generated from Fame. Our online tools are usually done with flash or flex.
Pingback: What Visualization Tool/Software Should You Use? – Getting Started | FlowingData « Netcrema – creme de la social news via digg + delicious + stumpleupon + reddit
Python is very good for static visualizations. I like a lot MatPlotLib http://matplotlib.sourceforge.net/: a python 2d plotting library.
Another awesome software is NodeBox (that is python based too), unfortunately it’s for mac-users only. It’s a very good alternative to Processing in some specific jobs.
Pingback: What Visualization Tool/Software Should You Use? – Getting Started … | Scheduling software live today
Hi. For vector graphic I use Inkscape. It’s free, far more usable than GIMP (another open source graphics program) and suits very well my needs. I love Processing but something that it cannot beat is Flash interactivity and it’s graphics rendering tree object model. It makes Flash so much better for GUIs and interactive datavis. It can be also free if you use opensource Flex SDK for Actionscript programming. I wouldn’t call it a begginers level though.
Great post Nathan. I have been to each of the web-based data sources you mention but had forgotten about 2 of the 3.
Many of the other tools for publication on the web are new to me. I’m pretty heavily into Tableau, Excel and SQL and have looked at R, but frankly it looks like the learning curve is too long to be worth the investment when tools like Tableau are available (if you have the funds) which do a very good job.
Thanks for this post. Very useful.
The R learning curve is very much worth it if you ever need to do anything beyond simple data manipulation that you are comfortable with in other programs. If you need to do any statistical analysis (which in my opinion would help a lot of visualizations be a lot more informative), you should really look into R.
This one might help.
http://www.statmethods.net/index.html
Or this one if you now SAS or SPSS:
http://www.rforsasandspssusers.com/
which can be bought
http://www.amazon.com/R-SAS-SPSS-Users-ebook/dp/B001Q3LXNI/ref=dp_kinw_strp_1?ie=UTF8&qid=1217456813&sr=8-1
and of course, the R site. But all this needs MUCH time to learn so that you can smoothly use it.
These intro to R slides might help too:
http://forums.flowingdata.com/topic/introduction-to-r-slides
The data I work with is rarely clean or structured in a way ready to be vized. I use Lyza from http://Lyzasoft.com to not only connect to many data sources and transform, but to figure out what data to look at with profiling and verify it is valid. I then viz it with Tableau so it is interactive. Both apps let me do most of the complex stuff with drag and drop, and create many iterations and approaches to the data quickly.
I FINALLY convinced my company to get a couple Tableau Desktop licenses – well worth it!
We use it connected to an SSAS cube.
Here is an example of a good geotemporal one in action: http://tiny.cc/jFOgK
It’s called GeoTime
Interesting thread, Nathan. For me usability for the analysist and understandability for the enduser of the analysis are crucial, besides all the whistles and bells of nice viz gadgets.
If the people that ordered the analysis don’t understand the viz the work is useless.
This is an issue that does not pop up too much here, because statisticians love their tools. But it is the one that ordered that analysis who counts.
I use OpenOffice for table analysis, because it does the same as Excel, and it has comparable graphing capacity, but costs nothing.
And a crosstab is often the most complicated tool a non-statistician can understand.
For anything more complicated than table analysis spreadsheets are no good idea, you have to go for real stats packages. SPSS is a good tool that does not demand the steep learning curve of R.
Most data are not clean when you get them, they have missing data, outliers, etc. The free ViSta (at http://www.visualstats.org and at http://www.uv.es/~prodat/ViSta ) allows – among full multivariate stats – to run a visual Exploratory Data Analysis under Windows, Mac and Unix.
BTW You mean Lyzasoft, I guess, at http://www.lyzasoft.com.
Tableau is expensive and not as flexible as I’d like, but there’s a subset of tasks (basically making visualizations of 2 to ~5 dimensional data stemming from an SQL query) for which it is excellent.
Parallel Sets (see sig. URL) is interesting, but I haven’t used it enough to understand what it’s best suited for.
Frank,
The Vista sites look a little… dated (e.g. no mention of any OS after Windows 2000). Is it still under active development?
I run ViSta under Vista. It is still under – slow – development, the current version is 7.2. There is a user group, too.
Pingback: Getting started with visualization - elearnspace
The only product that we’ve come across that can cover all three of the traditional, single visualization techniques – spatial, temporal, and link analysis – is GeoTime. The others only paint a picture. GeoTime combines all three to tell the full story.
Thanks for the great summary.
For web visualizations, I’ve recently had great luck with flot.js for standard line charts. It’s highly customizable and extremely fast. http://code.google.com/p/flot/
Also, the Raphaël Javascript library is great for drawing custom SVG/VML (compatible across browsers), basically any vector graphics. http://raphaeljs.com/
A very cool SVG-based alternative, in alpha stage and being written by Mike Bostock, a Stanford CS grad student of Flare’s author Jeff Heer, is protovis (http://protovis.org/) I like it you can make very concise javascript expressions to make quite sophisticated mappings from data to marks-on-screen.
Thanks for this post. I’ve been wrestling with Mozilla’s html pages lately, and it’s just terrible. Java and html parsing don’t seem to go together at all. I like the earlier user’s suggestions of using python for the data parse and then Processing for the actual viz step.
Not only is Processing pretty easy, but it has such a great user community. Even if I feel my questions are lame, I always get a useful answer.
I’ve been playing with Parallel Sets as well.
definitely. beautifulSoup is a good Python library to parse HTML (and XML) pages:
http://www.crummy.com/software/BeautifulSoup/
Great pointers on public databases. Thanks much.
Pingback: Data Visualization As Generative Narrative » Blog Archive » Tools and software for data visualization
Pingback: What Visualization Tool/Software Should You Use? – Getting Started — Some Random Dude
Definitely saving this post!
Google Spreadsheets has a Motion Gadget widget that is very nice to show time-based data. I first learned about it from watching Hans Rosling’s TED talks. Check out what he’s done at http://www.gapminder.org as well.
Pingback: What Visualization Tool/Software Should You Use? – Getting Started | FlowingData | Daniel Johnson, Jr.
I guess I need to check out R. I’ve dabbled with Processing, but as a Flash+PHP developer, I typically stick to what I know best when trying to do a complex task.
Probably not the most ideal method, but I’ve actually used PHP (+GD2) when drawing images derived from large data sets that reside in MySQL databases. It was an easy way to go from query -> image manipulation without having to create an API or anything.
AWK and gnuplot.
Great thread…I agree about the many tools in the toolkit
One that I’m starting to like is ethority; for a BI solution it put some thought into the data visualization functionality.
Good for simple visualizations; simple data manipulation, cost effective, and it can handle the conversion between a report layout to a viz.
Plus being a SaaS solution as well.
Pingback: links for 2009-09-03 « Glenna DeRoy
Nice site!
I use Mathematica for pretty much anything where more than a couple of curves are required. Downsides are cost, and very steep learning curve. Upsides are programatic graph generation, quality & wide variety of plot types.
For quick plots, Excel is fine.
Pingback: What Visualization Tool/Software Should You Use? – Getting Started …
What should also be mentioned is the Java prefuse Toolkit by Jeff Heer:
http://www.prefuse.org
The mentioned Flash toolkit flare is an offspring of its Java version prefuse…
Pingback: More on Visualization « bill | petti
Pingback: links for 2009-09-04 « Glenna DeRoy
As a note, John Resig (the principal author of jQuery, among other things) ported the Processing framework to JavaScript, which makes it much more web-friendly. It’s compatible with Firefox, Safari, Chrome, Opera, and (with the Explorer Canvas script), Internet Explorer, and it works quite nicely. It’s available at http://processingjs.org/.
We use SQL or MS Access connected to Tableau. Unlike Excel which has to be refreshed, Tableau is active and interactive. We and our clients love it!
Pingback: Some Stuff You Might Find Interesting 9-8-2009 « The New School of Information Security
I use http://visualizefree.com to create analytic visualizations. It’s free, web-based, and completely drag-and-drop.
For web presentations, amcharts is an option. http://www.amcharts.com/
Great post, Nathan. We use Tableau for brainstorming ideas (with whatever db is needed). Then our designers use Illustrator to mock up the viz. Then our programmers take it all into Flash. We’ve built a number of our own ActionScript libraries for dataviz.
Interesting to see what others are doing. Thanks!
and does periscopic publicly release any of those actionscript libraries? :)
Pingback: Dailycious 01.09.09 « cendres.net
We use JuiceKit (http://juicekit.org) , our open-source lightly patched Flare library integrated with features that make integration with Flex easier. In particular, we want to make it easy to make Flare visualizations that work with Flex data binding.
http://www.juiceanalytics.com/demos/airline/
http://media.juiceanalytics.com/census/treemap/TreeMapDemo.html (view-sourceable)
JuiceKit really isn’t for data-exploration–we use Excel and NodeBox for that (Tableau, of course is great too)–but about publishing NY Times quality visualizations.
Pingback: (pluri)TAL / ILPGA [U. Paris 3]
I use MaganView (http://www.magnaview.com) for data analysis and visualization.
The tool uses in-memery analysis and is able to show all data at once.
Pingback: Planner Reads » Blog Archive » Best of FlowingData: September 2009
Pingback: Data Visualization