Visualize This: The FlowingData Guide to Design, Visualization, and Statistics

Learn to data.

  • With the unveiling of the brand new iPhone 3G, Twitter has been buzzing with excitement. One of the more interesting new iPhone features is built-in GPS. Your iPhone will know when and where it is, opening up tons of possibilities for location-based applications – one of them being personal sensing, or rather, participatory sensing.

    Seeing the World in Data

    This is what I’ve been heavily involved with lately, working with the UCLA Center for Embedded Networked Sensing. Instead of iPhones, we use Nokia N80s. It’s the idea that individuals can use existing mobile technologies to gather and analyze data about the world around them.

    On With the Show

    Here’s our super cool, unbelievably awesome video taking a look at the near future of personal data collection with everyday mobile phones:

    A little corny, yes, but informative.

    How can non-experts make use of such huge amounts of data? I’m glad you asked! Visualization of course. More on this later.

  • The most recent FlowingData poll asked what you use to analyze and/or visualize data. Thanks to all 347 of you who participated.

    I was surprised by the percentage of you who mainly use Microsoft Excel, mostly because last month’s poll showed a near majority of you in computer science, design, and statistics. Although, R did have a strong showing too. Maybe it’s the information scientists and business folks representing for Excel?

  • Data visualization means different things to many people. To some it’s an analytical tool while to others it’s a way to make a statement. In my experience, those interested in data visualization fall into these five categories.

    The Technician

    WrenchTechnicians are all about implementation. They have a strong programming background with experience in Processing, Actionscript, or some other similar language and probably have worked with large databases at one point or another. To technicians, aesthetics is not as important as getting things to work. After everything – database, hardware, code – is hooked together, it is then the technician tries to spruce things up. Show them a visualization and they’ll want to know to know how it was made.

    The Analyzer

    Chalk BoardData is priority to analyzers. Like technicians, aesthetics are not the greatest concern; rather, analyzers want to know the relationships between variables, find positive and negative trends, and are most likely to tell you that you should have used a different type of graph or chart for that dataset. Tools like R, Microsoft Excel, and SAS are analyzers’ weapon of choice. Many will have programming experience but don’t code as well as technicians. Show an analyzer a visualization and they’ll most likely comment on the (complex) patterns they see.

    The Artist

    Paint brushArtists are obsessed with the final product – what the visualization will finally look like. They are the designers who are most likely to think long and hard about colors, visual indicators, and whether or not that square box should be moved up 2 pixels to the left. Programming is not a strong point, but if it is, it’s most likely in Processing. The weapon of choice though is the Adobe Creative Suite, namely Illustrator and Photoshop. Artists are most likely to tell you that something is ugly.

    The Outsider

    The OutsiderThe outsider is the one with a complex data set but not quite sure what to do with it. Outsiders are the field experts who want to visualize their data but might not have the know-how to follow through. They can, however, provide plenty of context and usually have a sense for what their data is about. You’ll most often see the outsider with a pen and paper explaining things to the technician, analyzer, and artist.

    The Light Bulb

    Light BulbLight bulbs are the idea people. They’ve got some programming, design, and analytical experience, but they’re not necessarily experts in all three areas. Because of all the experience, the brighter bulbs can usually handle a large data visualization project on their own (if they had the time). Knowing what’s possible and not possible, light bulbs lead projects and can delegate work across a team. It’s all about the big picture for the bulbs while the brightest are like the zen masters of data visualization.

    I consider myself some combination of the analyzer and technician. I’m still searching for the artist in me. I’ve got some design experience, but there’s still a lot to learn – always more to learn.

    What data visualizer type are you?

  • The above New York Times graphic shows where each candidate got his or her support from. The x-axis (horizontal) represents strength of support and the y-axis shows the number of states.

    On the surface, it’s a stacked bar chart, but the animation as you browse the groups (e.g. under age 30, whites, blacks), makes things interesting. Highlight a state and watch it move left to right and right to left or just click on “blacks” and watch all the states shoot to the right in support of Obama. FlowingData readers will recognize the names of the skilled graphics editors who made the graphic – Shan Carter and Amanda Cox.

    [Thanks, Chris]

  • The DiceCory Doctorow from The Guardian writes about our inability to understand the statistics of rare events. We obsess so much over the near-impossible probability that something could happen that it clouds our vision of more probable events.

    The rare – and the lurid – loom large in our imagination, and it’s to our great detriment when it comes to our safety and security. As a new father, I’m understandably worried about the idea of my child falling victim to some nefarious predator Out There, waiting to break in and take my child away. There’s a part of me who understands the panicked parent who rings 999 when he sees some street photographer aiming a lens at a kids’ playground.

    But the fact is that attacks by strangers are so rare as to be practically nonexistent. If your child is assaulted, the perpetrator is almost certainly a relative (most likely a parent). If not a relative, then a close family friend. If not a close family friend, then a trusted authority figure.

    Says Doctorow, such misunderstanding is why we gamble in casinos and why we have to wait in long security lines at the airport. We see piles of money and terrorist attacks when ultimately, the chances that you’ll win a jackpot or pass over violence is much less likely – near impossible – compared to losing all of your money and losing valuables to a curious luggage handler.

    If there’s one thing the government and our educational institutions could do to keep us safer, it’s this: teach us how statistics works.

    Amen to that.

    [Thanks, Jan]

  • Want to have some fun and win an Amazon gift certificate in the process? Read on.

    Personal data visualization has a huge advantage over other types of visualization. Personal visualization is about you, for you, and the data is from you. That’s a ton of background information with very little effort. As Jeffrey Heer noted in Socializing Visualization, people tend to spend more time exploring data when they connect personally to what they are seeing.

    This Project’s For You

    Running off this idea, this summer project is all about you – literally.

    Take a moment and think about the data “flowing” off of you. How much did you spend on coffee over the past month? How much sleep did you get yesterday or the past week? Did you gain or lose weight this year? Look through your past billing statements, your iTunes listening history, or your car’s odometer.

    Do you have a certain number (or series of numbers) in your head now? What do you see?
    Read More

  • The Bestiario design group seems to have been busy lately. Their latest project, TEDSphere, unsurprisingly, places the ever-so-popular TED talks series in a spherical space. You can watch TED talks from both inside and outside of the sphere, which is pretty cool.

    inside tedsphere

    Talks are connected with lines to show relationships between lectures. Originally, I thought relationships were talks with similar tags, but I clicked around, and that doesn’t seem to be case, so I’m not immediately sure.

    Similar Look and Feel

    TEDSphere has a similar look and feel to Bestiario’s previous works with the 3D browsing and connections, which is nice and often provides smooth browsing experience. Although I wish the 3D environment could be rendered a bit more smoothly. Edges and connecting lines always look so coarse. It’s probably a limitation of the Flash environment, but if that could be accomplished, these 3D projects could look that much better and feel less alpha.

  • It’s time for a reader discussion, open thread, etc. Today’s question is:

    What are your favorite data visualizations in recent memory?

    It can be something I’ve posted or it can be something I missed. To get your memory going, you might want to go through the archives. Are there any visualizations that made you stop and go wow?

  • Trulia, the real estate search site, launched Trulia Snapshot today in collaboration with Stamen Design. It’s a pretty mapping interface that lets you view pictures of properties on a map in a very interactive way i.e. it’s fun to use and super fluid.

    First, you type a location you want to find properties at.

    First page

    From there you can browse properties by newest/oldest or most/least expensive with the map or with the histogram at the bottom.

    Full UI

    Select Property

    If you just want to sit back and watch, press play and the real estate properties will highlight automatically by the order you’ve selected, and the map will move back and forth by location. See something you like? Press pause. If not, just let the animations keep running – your own personal real estate agent.

    My favorite part of the visualization is how the bottom images blur as you whiz by. It’s a very small part and not the focal point, but it’s one of those little design things that make it that much better. Nice touch.

    Ultimately, success of such work is measured by (although it shouldn’t need be) whether or not users would rather browse data with the visualization or with the usual listing pages. Give it a try – what would you rather use?

  • Steven Wood‘s thesis project, Tag Galaxy is a beautiful piece of work to visualize Flickr tags and pictures. Type whatever tag you want, and the results are organized with your tag as the sun and related tags as orbiting planets. Rotate and browse the galaxy to view pictures with the corresponding tag. Above was the result that I got after inputting “visualization”.
    Read More

  • Wireless Summit LogoI’m headed to Washington tomorrow for the International Summit for Community Wireless Networks. There are several sessions tomorrow, of which I’ll probably attend Using Wireless Networks for Human Rights and Wireless Sensor Networks for sure. It’s not so much the actual hardware or technical implementations I’m interested in. Rather it’s what wireless networks (e.g. WiFi) can provide – a means to communicate and share information.

    There will of course be wireless there (I would hope), so I’ll be twittering during the event. I’m not sure what to expect. Either I will be really interested or super bored. Hopefully it’s the former.

  • MoneyThe Internet has made it easier to donate to presidential campaigns, so much so that the Federal Election Commission has had a hard time keeping up with the seemingly sudden influx of data they have to process.

    The campaign finance reports filed by Obama and Clinton have grown so massive that they’ve strained the capacity of the Federal Election Commission, good government groups, the media and even software applications to process and make sense of the data.

    Hold up. Even computers are buckling under the pressure? The first things that came to mind were crashing servers and tech maintenance pulling their graying hair out. Reading on though, “software” is a reference to Microsoft Excel 2003, which can’t handle data files larger than 65,536 rows or 256 columns.

    Phew, that was close. I mean, come on, this is nation-wide data. Give me a MySQL dump for Pete’s sake.

    Anyways, tt’s certainly a good indicator for how times have changed data-wise. Excel 2007 can handle more. And on that note – it’s still possible to open John McCain’s monthly reports in Excel 2003.

    [Thanks, David]

  • Do you have a product or program aimed at statisticians, computer scientists, and/or designers that you want to place in front of thousands of data-minded individuals?

    Campaign packages for 125×125 pixel ads on the sidebar here at FlowingData are now available. The package includes prominent ad placement as well as a mention in a monthly thank you to the sponsors post.

    For more information on how to become a FlowingData sponsor, visit the details page or click the button in the sidebar.

  • Nokia N80Mobile technology has come a long way from those foot-long phones hooked up to a shoe box sized battery pack. With bluetooth, GPS, cameras, and Internet connections, mobile phones nowadays pack a lot of power. How can we put this functionality to use?

    Mobile Phones for Personal Data

    The technology to collect data about ourselves is available. We can record where we have been with GPS, and with cameras, we can keep track of what we have seen. We can then upload this data regularly with a persistent Internet connection, and what we end up with are travel patterns and live image streams.

    Putting Personal Data to Use

    Now things start to get super interesting. The challenge is to figure out what to do with all the data.

    • What do you do with a year’s worth of location traces or a year’s worth of pictures taken every few minutes?
    • What story can you tell and what inferences can you make?
    • Can you combine data from the phone with existing databases e.g. weather, environment, or traffic?
    • What type of visualization is more effective in making data available to non-expert users?

    In the coming weeks I will be investigating these questions on this subject of self-surveillance, and if you don’t mind, will be bringing all of you along for the ride (towards completing my dissertation :).

    What would you do with location data or a continuous image stream from a year of your life?

  • Rows in a Field
    Photo by Duncan H

    One of the huge factors that drew me in to statistics is that you can apply it to so many different areas of study. When someone asks me what the job market is like for someone in statistics, I always tell them, “Wherever there’s data, there’s a job to fill by a statistician. Marketing, biology, traffic, finance, crime…”

    It’s also my way of answering, “What are you going to do when you graduate?” In other words, I’m not sure yet. I keep running into more and more fun stuff I can do with my degree so it’s hard to decide right now. But hey, it’s better to have too many paths to choose from that not enough, right?

    Interdisciplinary Statistics

    In the most recent Amstat News is a short article – Statistics as an Interdisciplinary Science:

    An issue touched on briefly is statistics as an interdisciplinary science. I think there is a general agreement that (almost) all other scientific disciplines need statistics (and statisticians).

    Speaking to people outside of the field, there’s this idea that statistics is very focused (which it is in some ways, I guess) and very narrow, but it’s pretty much whatever you want it to be. You can focus completely on say, crime, or you can be more broad and examine issues in social science, for example.

    It’s like design or computer science. You might use your skills for very specific areas like page layout or web programming, but just as easily, you could use that know how on a broad range of projects.

    In summary, statistics is awesome. What have you used statistics for lately?

  • It’s coming to the end of the academic year, which means there are lots of graduate students frantically finishing up their dissertations, defending, and earning their degrees (yay!). Here are some tasty visualization dissertations, new and old, worth thumbing through.

    Information Visualization for the People
    Information Visualization for the People by Mike Danziger, Massachusetts Institute of Technology, Comparative Media Studies

    Form of Facts and Figures
    The Form of Facts and Figures by Christian Behrens, Potsdam University of Applied Sciences, Interface Design

    Practical Tools for Exploring Data and Models
    Practical Tools for Exploring Data and Models by Hadley Wickham, Iowa State University, Department of Statistics

    Visual Tools for the Socio–semantic Web
    Visual Tools for the Socio–semantic Web by Moritz Stefaner, Potsdam University of Applied Sciences, Interface Design

    Computational Information Design
    Computational Information Design by Ben Fry, Massachusetts Institute of Technology, Media Arts and Sciences

  • Bestiario, the group behind 6pli, recently put up their piece that maps informational distance between cities. At the base is a freely rotating globe. Arcs, whose strength and height represent strength of relationship, connect cities. The metric to determine strength of relationship takes several contexts into account – Google searches for individual cities, cities together, and geographical proximity. Bestiario implemented the piece in actionscript and used their own 3d framework (in Spanish).

    [Thanks, Santiago]

  • The U.S. Census Bureau released their 2008 Statistical Abstract, the National Data Book, not too long ago (um, like in January). There are state rankings and data in 30 categories and many more sub-categories. All this data is in the form of PDFs and Excel spreadsheets, which doesn’t lend much to readability, but still, it’s nice to have access to all the information.

    Maybe FlowingData readers can put together a giant statistical abstract all conveyed through graphics. That would be cool. Above are six data sets that I picked from the billion or so available.

  • Popular Mechanics did a study on where it was safest to sit on an airplane based on all commercial jet crashes since 1971. Contrary to expert statements that “one seat is safe as the other,” the study found that it is safer to sit in the back.

    The funny thing about all those expert opinions: They’re not really based on hard data about actual airline accidents. A look at real-world crash stats, however, suggests that the farther back you sit, the better your odds of survival. Passengers near the tail of a plane are about 40 percent more likely to survive a crash than those in the first few rows up front.

    The percentages in the above graphic are survival rates.

    [Thanks, Tim]

  • In elementary school through high school, I always used Microsoft Excel for my charts and graphs (and use it to clean data every now and then). In undergrad, I learned all of my programming in C++ and Java and did a little bit of engineering stuff in MATLAB. When statistics rolled along, I always analyzed data using R.

    Then I got into data visualization, and for a while it was all about Processing. When I interned for The New York Times, I used a lot of Adobe Illustrator (and still really enjoy playing with it). Lately, I’ve been immersed in Actionscript.

    So what do you use to make sense of data?

    If your weapon of choice isn’t listed, I’d be interested to know what your “other” tool is in the comments, because, well, there’s always more fun stuff to learn.

    {democracy:3}