• FlowingData reached a long awaited milestone yesterday – 1,000+ subscribers. Thank you to everyone who has subscribed, commented on, and linked to FlowingData. To extend my thanks, I’m running a (very easy) contest to win Edward Tufte’s milestone book – The Visual Display of Quantitative Information.

    Reaching 1,000 Subscribers

    I started FlowingData ten months ago not really knowing what to do with it. My new hosting plan came with a free domain, so I thought, hey, FlowingData. One month in, I started this blog to convince people that statistics was more than their least favorite class in college.

    Since then, it has been my goal to reach 1,000 subscribers. It has been my goal to find 1,000 people interested in data or get them interested in it. This happened yesterday. FlowingData now has 1,025 subscribers – thanks to all of you!

    How to Win Visual Display

    So now I’m going to make it really easy for you to win a copy of Edward Tufte’s Visual Display. All you have to do is leave a comment on any new post (this one included) during the next 10 days. On March 31, I’ll randomly select a winner. The more comments you leave, the higher the chances are of winning.

    The comment should add to the conversation, and trackbacks and pingbacks don’t count. That’s it. If you have a valid mailing address that Amazon can deliver to, you can win. Oh, and make sure you leave a valid email address so that I can contact you when you win. Good luck and again, thank you, everyone.

    Next Step: 5,000 Subscribers

    Let’s get more people talking about data and visualization and find those who already are. I know that there are hundreds of thousands of people we haven’t reached yet. If you could take a few seconds to email one friend about FlowingData, I will super appreciate it.

  • Email has grown to be a huge part of our lives and is very much commonplace. We can connect with others in just a few clicks. With all the email sent per day, how can we understand these connections? How can we visualize the type of email we’ve been sending? Can we tell a story somehow with the thousands of emails we’ve sent, received, and deleted?

    These 21 email visualizations investigate. I’ve split them up into six categories – exploratory, analytic, mapping, metaphor, networks, and abstract.
    Read More

  • I stumbled across this dataset covering piracy of Oscar-nominated films over the last 6 years and a short analysis.

    Piracy by the NumbersDespite the Academy’s efforts to crack down on bootlegging, its attempts haven’t done a whole lot. Focus on stopping one area, like downloading, another area just grows more prolific, like Region 5 DVDs from overseas. A quick search in the right places will show you that piracy isn’t going away any time soon.

    I even met someone whose job it was to find people who were “seeding” films through bit torrents and to report them to police. I got the impression that it was a really tedious process and people go uncaught most of the time. I’m uh, not condoning this, but if you don’t want to get caught, just make sure you stop the torrent once you’ve got your file.

    Bootlegging on Seinfeld

    Bootlegging always reminds me of the Seinfeld episode when Jerry somehow gets caught up in a bootlegging scheme:

    [T]here was a kid couldn’t have been more than ten years old. He was asking a street vendor if he had any other bootlegs as good as Death Blow. That’s who I care about. The little kid who needs bootlegs, because his parent or guardian won’t let him see the excessive violence and strong sexual content you and I take for granted.

    For those interested (and I know you are), the term bootleg originates from hiding flasks of liquor in the legging of boots. Ahoy, matey.

    Photo by mumelopics

  • Two weeks ago, I vowed to stop procrastinating using two strategies:

    1. Make a to-do list every night to lay out what will get done the next day
    2. Enable the Greasemonkey script – Invisibility Cloak – which will block all the sites that I waste too much time on except during lunch and on the weekend

    Down You, ProcrastinationSince I enabled the plugins and started to-do lists, my browsing time has gone down a whopping 3.5% – from 10.11 hours per day to 9.76 hours per day. Ok, it doesn’t sound like much, but there’s a bit more to the story.

    Growing More Productive

    Even though the time decrease isn’t much, I’ve still been more productive than when I wasn’t trying to improve. Since all of my favorite sites – Facebook, Google Reader, this blog – are blocked during the day, I spend more time reading papers and researching stuff I’m supposed to be looking for.

    Planning to Improve More

    Productivity has gone up, but there’s still room for improvement. There have been days when I did not feel like working, so I cheated, and turned off the plugins and scratched the to-do list. As a result, I wasted a lot of time.

    On the days I feel blah, I’m going to avoid turning off the plugins and see where that takes me. I will also work on creating more specific to-do lists the night before, because when I put in vague tasks like “go over papers” it didn’t really get done. However, if I put in, “read paper X, paper Y, and summarize each” then it usually got done.

    Failed Tactic

    I also tried hiding the dashboard (I have a Mac) so that I couldn’t see that I had new emails, but that just (as embarrassed as I am to admit) let me wondering more. I would keep checking which seemed to waste more time.

    I’ll put in my final report in two weeks.

    How’s everyone else doing?

  • Jose Luis Vicente and Irma Vilà, in collaboration with Bestiario, have created an interactive installation in Flash that allows you to explore the radio spectrum – the electromagnetic space covering signals from radio and television to GPS, bluetooth, and mobile phones. The piece represents a database of projects and services (in the the radio spectrum) developed over the past decade.
    Read More

  • Hi, Boing Boing readers. Welcome to FlowingData. For the new visitors, here’s the rundown (and for the old visitors, welcome back). My name is Nathan, and I’m a statistics graduate student / computer science graduate obsessed with data and visualization. Here on FlowingData I cover how statisticians, computer scientists, designers, and other experts use data to help us better understand ourselves and our surroundings.

    For more details, check out the about page and feel free to contact me if you have any questions. If you like what you see, you might want to subscribe to the feed.

    Again, thanks to David and Boing Boing for linking here, and again, thanks to Mike for making the suggestion!

  • In light of the MySpace photo breach (due to their negligence) a couple…

  • I just created a new Twitter account, and it got me to thinking about all the data visualization I’ve seen for Twitter tweets. I felt like I’d seen a lot, and it turns out there are quite a few. Here they are grouped into four categories – network diagrams, maps, analytics, and abstract.

    Network Diagrams

    Twitter is a social network with friends (and strangers) linking up with each other and sharing tweets aplenty. These network diagrams attempt to show the relationships that exist among users.

    Twitter Browser

    Twitter Browser

    Twitter Social Network Analysis

    The ebiquity group did some cluster analysis and managed to group tweets by topic.

    Twitter Social Network Analysis

    Twitter Vrienden

    Twitter Vrienden

    Twitter in Red

    I’m not completely sure how to read this one. I looks like it starts from a single user and then shoots out into the network.

    Twitter in Red

    Twitter Network

    Twitter Network

    Read More

  • Wired Magazine recently did a feature on data-driven art.

    The above image is Jason Salavon’s work that shows U.S. population by county. The technically-minded readers might be thinking, “I don’t get it. What am I seeing here? I don’t even know what county has the greatest population.” I understand where you’re coming from, but hey, it’s art not a status update.
    Read More

  • I’ve dabbled quite a bit throughout my academic career. I started in computer science, then electrical engineering, and then statistics. I also considered a future in business, environmental science, civil engineering, and urban planning, but I’ve finally settled on a combination of statistics and design — data visualization.

    Here are the 4 visualizations that got me interested and left me wanting more.
    Read More

  • Area Codes by LudacrisI thought this map was amusing. As you can see, Mr. Bridges prefers those in the southeast and northeast according to his 2001 hit single, Area Codes in which he raps about all the female friends he has made.

    This is yet another example of the ubiquity of data. If you can find hoe data in Ludacris’ Area Codes, you can find data anywhere. Here’s the large version of the above map. By the way, I’m sorry if I’ve offended anyone with this hoe data. Hoe data.

    [via Strange Maps]

  • New York Talk ExchangeNew York Talk Exchange – Illustrates the global exchange of information in real time by visualizing volumes of long distance telephone and IP (Internet Protocol) data flowing between New York and cities around the world.

    A Week in LifeA Week In the Life – A data sculpture made out of cardboard representing movement and communication from a cell phone in one week to increase awareness of the German Telecommunications Data Retention Act.

    National Gruntledness IndexNational Gruntledness Index – A heat map showing where in the United States most people are, um, gruntled. Is this for real? Somehow I don’t think the entire country is pissed off.

    DanweiLooking for a Design Job in China? – Danwei is looking for a smart, skilled creator who can present raw economic data in a very visual way.

  • FacebookI started a FlowingData Facebook group a couple of weeks ago, and I guessed that about two people would join. I was slightly off, and we’re up to 92 now (plus me), which makes me happy. Thank you for making me happy you 92 people :).

    I do have one more favor to ask of you. If there’s anything you find interesting – data sets, visualization, art pieces, analyses, posts from your own blogs – please do post to our group. My hope is that FlowingData will grow into more of a community than just me on my soap box. As much as I like hearing myself talk, I like listening to what others have to say a lot more.

    If you haven’t joined the FlowingData Facebook group yet, I highly encourage it. We’re from all areas of the data world from statistics to design, computer science, to education, psychology, economics, and many others, which makes for very good conversation.

  • United Nations Data LogoFor our Humanflows project, we used the United Nations Common Database for our demographic numbers. Anyone who has used the common database knows that it’s not especially user-friendly. You have to go through a series of non-intuitive dropdown menus to get the data you want. You then have to decipher the downloaded data’s CSV format. The recently released UNdata relieves a lot of these problems.
    Read More

  • Computation+JournalismThese highlights from Journalism 3G are pretty overdue, but better late than never. Here’s what I thought was most interesting.

    Sensemaking and Information Visualization

    Naturally, my ears perked up on the second day when the sensemaking and information visualization panel began. Jeff Heer, who I’ve referred to a few times before, was the standout of the group. His presentation was for the most part on his paper – Voyagers and Voyeurs: Supporting Asynchronous Collaborative Information Visualization with Fernanda B. Viégas and Martin Wattenberg. It’s a pretty good read that covers topics like vizster and the pre-Many Eyes project sense.us.

    Vizster by Jeffrey Heer

    However, it wasn’t so much the material that was so interesting. It was the way Heer presented his material that captivated the audience. From the static visualizations to the animated ones, it was another great example of how powerful visualization can be.

    John Stasko from Georgia Tech also had some fun visualization work to show. His presentation was more of an overview of why journalists should care about visuals. As chair of last year’s InfoVis conference, he did a good job.

    Journalistic Video Games

    Fatworld by Persuasive Games

    Ian Bogost gave an interesting talk on the role of video games in journalism. The focus was mostly on his work with Persuasive Games:

    Our games influence players to take action through gameplay. Games communicate differently than other media; they not only deliver messages, but also simulate experiences. While often thought to be just a leisure activity, games can also become rhetorical tools.

    Think games are just for fun? Think again.

    One thing that Bogost said stuck with me. He said that video games are usually bad at telling stories. Many games put up a road sign for an issue but don’t really go any further than that. Persuasive Games tries to go deeper to make players think about the issues presented.

    We can say this about a lot of data visualization projects out there (you know which I’m talking about); they try to make a statement but don’t really go into the why or how we can change.

    Citizen Scientists

    Finally, there was Mark Hansen, who was actually the first speaker of the conference (and happens to be my adviser). Hansen talked about his recent work with Ben Rubin at The New York Times building and moved on to citizen science.

    Brad Stenger did a good job summarizing Hansen’s talk in his detailed recap on infosthetics, but the main point to take away — citizens certainly play an important role in data collection and reporting. Over time, as technology advances, citizen science will only play a larger role in ubiquitous journalism collecting, analyzing, and making use of data.

    Lasting Impressions

    The Journalism 3G coordinators put together a very good set of talks covering a lot of different areas. As journalism spreads outside of the conventional paper, it’s clear that collaboration between journalists and techs is vital to future success.

  • Data is absolutely vital to Google’s success; without data, Google is pretty much useless when it comes to search. Hal Varian explains on the official Google blog:

    Over the years, Google has continued to invest in making search better. Our information retrieval experts have added more than 200 additional signals to the algorithms that determine the relevance of websites to a user’s query.

    So where did those other 200 signals come from? What’s the next stage of search, and what do we need to do to find even more relevant information online?

    What an interesting question. I wonder what the answer is. Oh, here it is:

    Storing and analyzing logs of user searches is how Google’s algorithm learns to give you more useful results. Just as data availability has driven progress of search in the past, the data in our search logs will certainly be a critical component of future breakthroughs.

    Cashing In On Data

    That’s right. Without data, who knows where search could be now. AOL might still be prosperous. There’s also this funny bit about how Larry and Sergey initially tried to license their algorithm to new, already existing search engines, but no one bit, and so they made their own. You gotta respect the data!

    For more on the importance of data, you might also be interested in the ever-going series on FlowingData on why data matters.

  • Santiago, who I met at the Visualizar workshop, forwarded me his work on the visualization of del.icio.us tags and bookmarks called 6pli. Normally, I’m not a big fan of network diagrams, because I always seem to get lost in all the nodes and edges cluttering up the place. I feel differently about 6pli though.

    6pli sets itself apart with really smooth, responsive interaction and three views – elastic net 3-d, elastic net 2-d, and circle 2-d. All three views rely on a metric of tag-similarity. So the more co-tags that a single tag has with its neighbors, the closer the tags will be in proximity.

    Was that confusing? OK, it’ll be more clear with pretty pictures.

    Elastic Net 3-D

    The elastic net 3-D (pictured above) shows tags and bookmarks in a 3-dimensional view. Tags are in rectangles and bookmarks are circles. A bookmark (or circle) will be closer to another bookmark (or circle) if it has more tags in common. Similarly, if a tag is often grouped with other tags, it will appear closer to that group. Click on a tag, and a list of bookmarks show up on the right.

    The cool part is when you start playing with the 3-D network blobby. You can rotate it like a globe and the movement is controlled by spring action. The visualization’s response is immediate and really smooth with nice transitions from one view to the next, unlike this paragraph.

    Elastic Net 2-D

    Elastic Net 2D

    The 2-dimensional view is the same principle as the 3-D. The only difference is the 2-D is a projection of the 3-D view onto a flat plane. Smooth interaction still applies here.

    Circle 2-D

    Circles 2D

    Finally, the circle view arranges tags and bookmarks into their del.icio.us bundles. Each circle is divided homogeneously and the radius of the circle can me manually modified.

    One thing I would recommend for the beta release is some kind of input to type in a tag or the name of a bookmark. Right now, the starting point feels kind of random, but if I could specify where I wanted to explore, I think the viz would be that much more useful.

    Check out my 6pli del.icio.us tags viz here.

  • Clock by ToniVCI waste way too much time doing completely useless stuff when I should be working on my dissertation, reading papers, writing papers, and learning things that will bring me closer to my degree. I’m ready to stop procrastinating.

    How I Will Become More Productive

    In an attempt to work more efficiently, I am going to take up Seth’s self-experimentation offer that I found via Andrew’s post. I am going to self-experiment; I am going to collect data about myself; and I am going to find out if my two-pronged method to stop procrastination works. Here’s my plan:

    1. I will make a to-do list every night to lay out what will get done the next day
    2. I will enable the Greasemonkey script – Invisibility Cloak – which will block all the sites that I waste too much time on except during lunch and on the weekend

    How I Will Judge Improvement

    To measure my progress, I will make use of two Firefox plugins – Browser Statistics and TimeTracker. The former keeps track of the amount I’ve downloaded (in megabytes) while the latter is a timer for time spent browsing the Web.

    Luckily I’ve had these two plugins enabled for a little over a month, so at the end of this month, there will be something to compare to. From January 27 to March 2, I downloaded 23,524.73 megabytes and spent a whopping 364 hours browsing. That’s about 653 megabytes and a little over 10 hours per day. OK, that’s embarrassing.

    Join Me In This Self-experiment

    I’ll do this for one month with a midway report on March 17 and a final report on March 31. You can subscribe to the feed to stay updated, and if anyone wants to join me on this, all the better. Just leave a comment below so that we can keep track of results.

    Procrastination-free days start now.

  • I stumbled across a data table from the Social Security Administration that shows the probability of death. It’s an actuarial life table estimating the probability that you will die within one year given your age.
    Read More

  • Jonathan Harris and Sep Kamvar collaborated again in their featured piece at New York Museum of Modern Art’s Design of the Elastic Mind exhibit. Similar in flavor to their previous work, I Want You to Want Me explores the search for love and for self in the online dating world i.e. data collected from various online dating sites every few hours.
    Read More