• Nathan’s Annual Hot Dog Eating Contest – Kobayashi vs Chestnut

    July 3, 2008  |  Miscellaneous

    hot-dogs

    It's July 4th weekend which means lots of burgers and hot dogs across America. It also means it's time for Nathan's annual hot dog eating contest on Coney Island. From 2001 through 2006, 144-pound Takeru Kobayashi dominated the competition, but last year Joey Chestnut brought the crown back to the states with 66 hot dogs and buns (HDBs) in 12 minutes. Who will take the crown this year? Will Kobayashi reclaim the title or will Chestnut keep it in America? Oh the suspense.

    Take a look at the history of the event - dating all the way back to 1916.

  • Weekend Minis For Your Lazy Weekend – 6/21/08

    June 21, 2008  |  Miscellaneous

    FlowingData on Alltop - Alltop describes itself as the digital magazine rack of the Internet collecting stories from "all the top" places on the Web. You'll now find FlowingData on both the Design and Science racks. While you're there, check out all the other cool sites.

    Excel Contest for Science and Engineering - Jon Peltier, a frequent FlowingData commenter, is running a contest on modeling science and engineering. The key phrase is - A winner will be drawn at random.

    Video Game Addicts Not Shy Nerds - A study "showed" that only 1% of problem gamers (in their sample) had poor social skills. What a load off my back.

    Surveying the Family Feud Surveys - The WSJ Numbers Guy takes a look at the 100-person surveys on the long-running game show. Survey says?!?

  • 5 Types of Data Visualization People – What Type Are You?

    June 6, 2008  |  Miscellaneous

    Data visualization means different things to many people. To some it's an analytical tool while to others it's a way to make a statement. In my experience, those interested in data visualization fall into these five categories.

    The Technician

    WrenchTechnicians are all about implementation. They have a strong programming background with experience in Processing, Actionscript, or some other similar language and probably have worked with large databases at one point or another. To technicians, aesthetics is not as important as getting things to work. After everything - database, hardware, code - is hooked together, it is then the technician tries to spruce things up. Show them a visualization and they'll want to know to know how it was made.

    The Analyzer

    Chalk BoardData is priority to analyzers. Like technicians, aesthetics are not the greatest concern; rather, analyzers want to know the relationships between variables, find positive and negative trends, and are most likely to tell you that you should have used a different type of graph or chart for that dataset. Tools like R, Microsoft Excel, and SAS are analyzers' weapon of choice. Many will have programming experience but don't code as well as technicians. Show an analyzer a visualization and they'll most likely comment on the (complex) patterns they see.

    The Artist

    Paint brushArtists are obsessed with the final product - what the visualization will finally look like. They are the designers who are most likely to think long and hard about colors, visual indicators, and whether or not that square box should be moved up 2 pixels to the left. Programming is not a strong point, but if it is, it's most likely in Processing. The weapon of choice though is the Adobe Creative Suite, namely Illustrator and Photoshop. Artists are most likely to tell you that something is ugly.

    The Outsider

    The OutsiderThe outsider is the one with a complex data set but not quite sure what to do with it. Outsiders are the field experts who want to visualize their data but might not have the know-how to follow through. They can, however, provide plenty of context and usually have a sense for what their data is about. You'll most often see the outsider with a pen and paper explaining things to the technician, analyzer, and artist.

    The Light Bulb

    Light BulbLight bulbs are the idea people. They've got some programming, design, and analytical experience, but they're not necessarily experts in all three areas. Because of all the experience, the brighter bulbs can usually handle a large data visualization project on their own (if they had the time). Knowing what's possible and not possible, light bulbs lead projects and can delegate work across a team. It's all about the big picture for the bulbs while the brightest are like the zen masters of data visualization.

    I consider myself some combination of the analyzer and technician. I'm still searching for the artist in me. I've got some design experience, but there's still a lot to learn - always more to learn.

    What data visualizer type are you?

  • I Heart Dilbert

    May 10, 2008  |  Miscellaneous

  • Weekend Minis for Your Lazy, Relaxing Weekend

    May 3, 2008  |  Miscellaneous

    Visualization Criticism - A criticism on the criticism on visualization. Robert Kosara, Fritz Drury, Lars Erik Holmquist, and David Laidlaw argue that we need to critique to further develop viz theory.

    Data Visualization Talks Online - Talks for your viewing pleasure from the likes of Ben Fry, Eric Rodenbeck, Jonathan Harris, and others. A couple hours of weekend learning.

    Why Things Cost $19.95 - An interesting article from Scientific American on the "psychological rules of bartering." Any guesses on this somewhat arbitrary pricing?

  • Chart of the Day: A Breakdown of Facebook Applications

    May 1, 2008  |  Miscellaneous

    Of the 23,160 Facebook applications, I use about 5, but I probably wouldn't notice if someone randomly removed all of them from my profile in the middle of the night. Kids these days. I used to play BlockStar, but haven't used it since it changed to Tetris (formerly BlockStar) and haven't played Scrabulous since my 1,000,000th consecutive loss. What Facebook applications do you use (or not use)?

    Speaking of Facebook, have you joined the FlowingData group yet?

  • Greatest Data Visualization of All Time

    April 1, 2008  |  Miscellaneous

    Let me introduce you to the greatest data visualization of all time. FlowingData readers, greatest data visualization of all time. Greatest data visualization of all time, FlowingData readers. It will blow your mind and affect you to your very core. I haven't felt this way since 1987 when I first started to walk.

    ...and OF COURSE the YouTube embed isn't working, so I guess the link will have to suffice. Ladies and gentleman, be prepared to get up and dance. Here is the greatest visualization that you will ever see. You can thank me in the comments.

  • Warning: Nerdy Waters Ahead – Baby Got Stats and Too Logit

    March 29, 2008  |  Miscellaneous

    John Hopkins BiostatThis just might be nerdy statistics overload even for me. A group from the John Hopkins biostatistics department has created parodies of Sir Mix-A-Lot's Baby Got Back and MC Hammer's Too Legit To Quit. For your listening pleasure - Baby Got Stats and Too Logit.

    The songs are in MP3 format, so you can put them on your iPod and play them over and over and over again. One play-through was enough for me, but clearly, it's only a matter of time before this biostat group hits main stream.

    [via Freakonomics]

    Update: Here's the video version for your viewing pleasure.

  • Save the Space Time Continuum – Do Not Exceed 88 Miles Per Hour

    March 22, 2008  |  Miscellaneous

    energy

    Billions of watts are wasted every year including 1955, 1985, and 2015. Be kind to the environment and keep your speed under 88 miles per hour. The space-time continuum appreciates it.

    Roads? Where we're going, we don't need roads.

  • Why Does Data Matter to Google?

    March 5, 2008  |  Miscellaneous

    Data is absolutely vital to Google's success; without data, Google is pretty much useless when it comes to search. Hal Varian explains on the official Google blog:

    Over the years, Google has continued to invest in making search better. Our information retrieval experts have added more than 200 additional signals to the algorithms that determine the relevance of websites to a user's query.

    So where did those other 200 signals come from? What's the next stage of search, and what do we need to do to find even more relevant information online?

    What an interesting question. I wonder what the answer is. Oh, here it is:

    Storing and analyzing logs of user searches is how Google's algorithm learns to give you more useful results. Just as data availability has driven progress of search in the past, the data in our search logs will certainly be a critical component of future breakthroughs.

    Cashing In On Data

    That's right. Without data, who knows where search could be now. AOL might still be prosperous. There's also this funny bit about how Larry and Sergey initially tried to license their algorithm to new, already existing search engines, but no one bit, and so they made their own. You gotta respect the data!

    For more on the importance of data, you might also be interested in the ever-going series on FlowingData on why data matters.

  • Weekend Minis – Online Video, Visualization Types, Poverty, Digital Life

    February 23, 2008  |  Miscellaneous

    Weekend Treats

    A Tale of Two Types of Visualization and Much Confusion - Depending on who you talk to, data visualization can have very different meanings.

    It's Official. People Love Online Videos. Billions Of ‘Em. - 141 million unique viewers watched 10,156,199,000 videos this past December.

    Global Poverty Maps - Explores the political economy of aid, examining the contributions made by developed country governments and their role in development.

    My Trails Network - Inventing new ways to manage your digital life.

  • A Lesson in Recycling Chartjunk as Junk Art

    February 12, 2008  |  Miscellaneous

    What is Data and Why Should We Care About It?This guest post is by Kaiser Fung, from Junk Charts and Data Matter. He answers my question - "What is data and why should we care about it?"

    Who's got more data? The largest retailer in the world or the largest library in the world?

    Walmart tends to over 500 terabytes of data (see here, here, etc.) while the Library of Congress, largest according to the Guinness Book of World Records, has a petty 20 terabytes, cowered by comparison.

    To hear it from data warehouse vendors, data mining academics, data savvy politicians, or data fixated citizens, Walmart versus the LOC is like New World versus Old World, the future versus the past, fast versus slow, wired versus tired.

    The more things change, the more they stay the same. The flood of data has not washed away these two age-old truisms.
    Continue Reading

  • Understanding Data, Not Just the Realm of Scientists in Ivory Towers

    February 11, 2008  |  Miscellaneous

    What is Data and Why Should We Care About It?This guest post is by Hadley Wickham, a Statistics PhD candidate and a part of the GGobi team. He answers my question -- "What is data and why should we care about it?"

    For me, most data comes in the form of a data frame: a rectangular set of values with observations in rows and variables in columns. Most values are continuous (e.g. real numbers) or categorical (e.g. colours, treatments, subject ids), but are sometimes more esoteric (images, sounds, intervals). Each variable contains values of only one type and may also contain missing values. Missing values are particularly important for statisticians, and are often encoded as . or NA (encoding them as special numeric values, like 99, is generally a bad idea). Most data is "messy" and cleaning it up requires you to ensure that observations are in rows and variables in columns, as well as spending plenty of time to make sure that the values actually make sense (visualisation is really useful for this!).

    Data Helps Illuminate Patterns

    To me, caring about the message in data is the essence of science, where we perform some action on the world and record its response in our data. This isn't just the realm of scientists in ivory towers, but something that we do everyday, whether it's trying to understand the impact of a new marketing campaign, figuring out which house to buy or exploring why a new cancer drug isn't working. Recording and examining the data that matters not only supports rational decision making, but also reveals the unexpected and helps illuminate underlying patterns.

  • Showing Historical & Cultural Connections and Mapping Influence

    February 8, 2008  |  Miscellaneous

    What is Data and Why Should We Care About It?This guest post is by Mike Love, and he answers my question -- "What is data and why should we care about it?"

    Instead of answering in the general case, I'd be better off trying to answer it for an area of my interest.

    Historical Connections

    I think cultural history can be presented as data, and that we could get some benefit out of standardizing some atomic properties of cultural history. There are a couple good efforts at doing this: Artandculture.com is an "interconnected guide to the arts," where you can see what movement artists and others belonged too. The Knowledge Web is a project of James Burke of the television show 'Connections'. They are working to encode tens of thousands of historical connections into a database. I have been working on a similar dataset at the open database project Freebase. Each of these projects have moved beyond text (and hypertext) and into the realm of data.

    One seemingly trivial advantage of data over text or text with hyperlinks: you can specify that making a connection between person A and person B implies a connection in the reverse direction. This cuts the workload in half: Wikipedians entering relationships into an infobox in Wikipedia have to do twice the work of a person working in a database framework.

    Apply Relationships

    Influence Graph

    The more exciting advantage is the kind of applications that are possible once you have settled on a set of relationships. The team working on the Knowledge Web built a graph browser which embeds historical figures in their century and draws lines between these figures. Mousing over a line brings up some descriptive information about the relationship. A team at Metaweb built a graph browser which pulls up pictures of historical figures and lays out their influences and influencees in a circle surrounding them. You can imagine filtering in other ways: show all the connections between artists and writers; show all the cross-cultural connections between China and Europe. You could plug historical data into a recommendation system as well.

    There is nothing new about documenting cultural connections. There are many better, probably more reliable books that serve this purpose. (For Western history, I recommend Richard Tarnas' The Passion of the Western Mind, and Peter Watson's The Modern Mind.) But to design a dynamic interface to these books would require parsing the English language. Maybe we can do this too.

  • Increasing Data Literacy Across the General Public With Truth and Beauty

    February 7, 2008  |  Miscellaneous

    What is Data and Why Should We Care About It?Matthew Hurst, from Microsoft Live Labs and the co-creator of BlogPulse, answers my question - "What is data and why should we care about it?"

    In writing this brief article, I tried to answer the following: what attracts me to data?

    An Abundant Resource

    Data is everywhere - from the streams of posts in the blogosphere to stock trading graphics spilling from news media to science projects in kindergarten. It permeates our modern world, and yet few of us are equipped to interpret it critically. More importantly, few of us are protected against the misuse and manipulation of the truth via data. Users of databases (who include the millions of users of search engines) are slowly but surely becoming exposed to more sophisticated views of data and thus the average data literacy will, hopefully increase.

    Working in the field of data mining is very exciting at this time as it has the potential to truly impact the perception and understanding of the world-as-data. Sites like Swivel and Many Eyes are in some sense at the cutting edge of this progression, with major public databases (like search engines) nervously following their lead.

    A fundamental challenge in empowering users with data is the legacy of impoverished tools. Currently, one is required to make many low level interactions in order to synthesize a result required for a task. Consequently, the tools and infrastructure around data interactions have moved towards high volume, immediate response paradigms. However, the added value, increased accuracy and relevance of more sophisticated processes, and the additional investment required on the part of the user to learn how to consume and manipulate enhanced data displays comes with a cost. To make the jump, users will have to be convinced of the value of enhanced interactions and displays, spend a little more time working with the data and so on.

    A Vector of Truth

    Data, if collected and analysed correctly, can support or refute our intuitions and beliefs. In addition, the anlaysis of data can hint at some very human structures such as those found in language and in the ways in which we conceptualize the world. Data may be used to help us understand our environment. By working with data, we can grasp better models of ourselves and our world.

    Beauty in Exploration

    Visualization is an essential tool for understanding data and drawing inferences from it. The last ten years of advances in computer performance and graphical displays have opened up the possibilities for displaying data in rich and dynamic ways. This has lead practitioners down a dangerous path balanced between aesthetics - the visual impact and design of data display, and utility - the capability of a visualization to intuitively and efficiently assist the user. That being said, the aesthetics of data visualization can play a huge part in attracting users to the topic being visually described, to encourage them to ask 'it's pretty, but what is it?' Hopefully the answers to that question will lead to better understanding on all fronts.

  • Data Makes Reasonable Decision-making Possible

    February 6, 2008  |  Miscellaneous

    What is data?This guest post is by Andrew Gelman from Statistical Modeling, Causal Inference, and Social Science. He answers the question - "What is data and why should we care about it?"

    Good data are better than bad data, but worst of all are data whose quality you can't assess. Beyond this, we want to use statistical methods that allow us to combine data from many sources. I'm comfortable with regression and multilevel models, but other methods are out there too. In any case, we have to care about our data because inferences and decisions are just about always data-based, implicitly if not explicitly. Being the person in the room with the hard data gives you authority, as well it should.

  • May the Data Be With You, Young Skywalker

    February 4, 2008  |  Miscellaneous

    What is data?In response to my question, "What is data and why should we care about it?" - Zach Gemignani from Juice Analytics answered:

    Obi-Wan Kenobi could have been speaking about data in businesses when he said: "It's an energy field created by all living things. It surrounds us, and penetrates us. It binds the galaxy together."

    Data is the residue of every action and interaction that takes place in a company, with customers, and in the marketplace. Businesses have created complicated and effective nets to capture this data as it flies off in all directions. Unfortunately, mountains of data mean nothing. Like young Luke Skywalker's inability to control The Force, a company's inability to make use of data is nothing more than frustration and untapped potential.

    Making use of data takes a subtle combination of capabilities. It takes experience and context about the business, speed and skill to manipulate data, and an ability to visualize and communicate results. Data in the wrong hands is useless if not dangerous; in the right hands data can transform into new insights and informed decisions.

  • What is Data and Why Do We Care About it So Much?

    February 4, 2008  |  Miscellaneous

    What is Data and Why Should We Care About It?I've been fortunate to have worked with people from lots of different fields - statistics, ecology, computer science, engineering, design, etc. If I've learned anything, it's that everyone has a different idea of what data is and why it matters.

    I've found that until I've understood what my collaborators mean by data and what they (and me) are trying to get out of a dataset, it's near impossible to get anything useful done.

    To make things a bit more clear (and for my own enjoyment), I asked a select group of people a single question:

    What is data and why should we care about it?

    Those who responded are from different areas of expertise, ranging from statistics, to business, to computer science, to design. Some names you'll recognize while others will be new to you. All are doing interesting things with data.

    I've been looking forward to this series for a couple of weeks now, and my hope is that you will gain a better understanding about what data is and how people are putting it to use. Keep an eye out for posts with the black square image above.

    Here is who has answered so far:

    If you'd like to answer the question yourself, I'd love to see your response too, or if you write an answer on your own blog, please do post the link in the comments below.

  • Is My Interest in Data Obsessive?

    October 4, 2007  |  Miscellaneous

    I just saw Stranger than Fiction. The main character, Harold Crick, spends much of his life counting. He counts the number of steps it takes for him to walk from his home to the bus stop; he brushes his teeth 76 times every morning; he takes a 45.7-minute lunch break and a 4.3-minute coffee break.

    So much counting and tracking. Sounds kind of familiar. Maybe a little too familiar? Nah.

    71 words. 320 characters. Nine sentences. Wait, now ten. Eleven. Err, twelve...

  • Overgeneralizing on Chinese Takeout

    September 20, 2007  |  Miscellaneous

    Fortune CookiesMy roommate pointed out a couple of weeks ago that I always get Chinese takeout for dinner; however, we never get home at the same time, and most days, she's not even in the apartment when I arrive. How could she, a very bright and educated individual, come to such a conclusion after seeing so little data?

    In fact, by my count, she only saw me bring home Chinese takeout twice before she decided that yes, I do in fact eat Chinese every single day of the week. In reality I rotate through four choices -- sandwiches, Japanese, pizza, or Chinese with a few ventures out every now and then. This week I've had Japanese, hot dogs, Mediterranean twice, sandwich, burger, and Chinese.

    This is one of the reasons we need Statistics. What we perceive isn't always the truth. I might have had Chinese takeout on Monday and Friday, but do you know what I had on the days in between? If no, can you make an educated guess?

Unless otherwise noted, graphics and words by me are licensed under Creative Commons BY-NC. Contact original authors for everything else.