• Showing Historical & Cultural Connections and Mapping Influence

    February 8, 2008  |  Miscellaneous

    What is Data and Why Should We Care About It?This guest post is by Mike Love, and he answers my question -- "What is data and why should we care about it?"

    Instead of answering in the general case, I'd be better off trying to answer it for an area of my interest.

    Historical Connections

    I think cultural history can be presented as data, and that we could get some benefit out of standardizing some atomic properties of cultural history. There are a couple good efforts at doing this: Artandculture.com is an "interconnected guide to the arts," where you can see what movement artists and others belonged too. The Knowledge Web is a project of James Burke of the television show 'Connections'. They are working to encode tens of thousands of historical connections into a database. I have been working on a similar dataset at the open database project Freebase. Each of these projects have moved beyond text (and hypertext) and into the realm of data.

    One seemingly trivial advantage of data over text or text with hyperlinks: you can specify that making a connection between person A and person B implies a connection in the reverse direction. This cuts the workload in half: Wikipedians entering relationships into an infobox in Wikipedia have to do twice the work of a person working in a database framework.

    Apply Relationships

    Influence Graph

    The more exciting advantage is the kind of applications that are possible once you have settled on a set of relationships. The team working on the Knowledge Web built a graph browser which embeds historical figures in their century and draws lines between these figures. Mousing over a line brings up some descriptive information about the relationship. A team at Metaweb built a graph browser which pulls up pictures of historical figures and lays out their influences and influencees in a circle surrounding them. You can imagine filtering in other ways: show all the connections between artists and writers; show all the cross-cultural connections between China and Europe. You could plug historical data into a recommendation system as well.

    There is nothing new about documenting cultural connections. There are many better, probably more reliable books that serve this purpose. (For Western history, I recommend Richard Tarnas' The Passion of the Western Mind, and Peter Watson's The Modern Mind.) But to design a dynamic interface to these books would require parsing the English language. Maybe we can do this too.

  • Increasing Data Literacy Across the General Public With Truth and Beauty

    February 7, 2008  |  Miscellaneous

    What is Data and Why Should We Care About It?Matthew Hurst, from Microsoft Live Labs and the co-creator of BlogPulse, answers my question - "What is data and why should we care about it?"

    In writing this brief article, I tried to answer the following: what attracts me to data?

    An Abundant Resource

    Data is everywhere - from the streams of posts in the blogosphere to stock trading graphics spilling from news media to science projects in kindergarten. It permeates our modern world, and yet few of us are equipped to interpret it critically. More importantly, few of us are protected against the misuse and manipulation of the truth via data. Users of databases (who include the millions of users of search engines) are slowly but surely becoming exposed to more sophisticated views of data and thus the average data literacy will, hopefully increase.

    Working in the field of data mining is very exciting at this time as it has the potential to truly impact the perception and understanding of the world-as-data. Sites like Swivel and Many Eyes are in some sense at the cutting edge of this progression, with major public databases (like search engines) nervously following their lead.

    A fundamental challenge in empowering users with data is the legacy of impoverished tools. Currently, one is required to make many low level interactions in order to synthesize a result required for a task. Consequently, the tools and infrastructure around data interactions have moved towards high volume, immediate response paradigms. However, the added value, increased accuracy and relevance of more sophisticated processes, and the additional investment required on the part of the user to learn how to consume and manipulate enhanced data displays comes with a cost. To make the jump, users will have to be convinced of the value of enhanced interactions and displays, spend a little more time working with the data and so on.

    A Vector of Truth

    Data, if collected and analysed correctly, can support or refute our intuitions and beliefs. In addition, the anlaysis of data can hint at some very human structures such as those found in language and in the ways in which we conceptualize the world. Data may be used to help us understand our environment. By working with data, we can grasp better models of ourselves and our world.

    Beauty in Exploration

    Visualization is an essential tool for understanding data and drawing inferences from it. The last ten years of advances in computer performance and graphical displays have opened up the possibilities for displaying data in rich and dynamic ways. This has lead practitioners down a dangerous path balanced between aesthetics - the visual impact and design of data display, and utility - the capability of a visualization to intuitively and efficiently assist the user. That being said, the aesthetics of data visualization can play a huge part in attracting users to the topic being visually described, to encourage them to ask 'it's pretty, but what is it?' Hopefully the answers to that question will lead to better understanding on all fronts.

  • Data Makes Reasonable Decision-making Possible

    February 6, 2008  |  Miscellaneous

    What is data?This guest post is by Andrew Gelman from Statistical Modeling, Causal Inference, and Social Science. He answers the question - "What is data and why should we care about it?"

    Good data are better than bad data, but worst of all are data whose quality you can't assess. Beyond this, we want to use statistical methods that allow us to combine data from many sources. I'm comfortable with regression and multilevel models, but other methods are out there too. In any case, we have to care about our data because inferences and decisions are just about always data-based, implicitly if not explicitly. Being the person in the room with the hard data gives you authority, as well it should.

  • May the Data Be With You, Young Skywalker

    February 4, 2008  |  Miscellaneous

    What is data?In response to my question, "What is data and why should we care about it?" - Zach Gemignani from Juice Analytics answered:

    Obi-Wan Kenobi could have been speaking about data in businesses when he said: "It's an energy field created by all living things. It surrounds us, and penetrates us. It binds the galaxy together."

    Data is the residue of every action and interaction that takes place in a company, with customers, and in the marketplace. Businesses have created complicated and effective nets to capture this data as it flies off in all directions. Unfortunately, mountains of data mean nothing. Like young Luke Skywalker's inability to control The Force, a company's inability to make use of data is nothing more than frustration and untapped potential.

    Making use of data takes a subtle combination of capabilities. It takes experience and context about the business, speed and skill to manipulate data, and an ability to visualize and communicate results. Data in the wrong hands is useless if not dangerous; in the right hands data can transform into new insights and informed decisions.

  • What is Data and Why Do We Care About it So Much?

    February 4, 2008  |  Miscellaneous

    What is Data and Why Should We Care About It?I've been fortunate to have worked with people from lots of different fields - statistics, ecology, computer science, engineering, design, etc. If I've learned anything, it's that everyone has a different idea of what data is and why it matters.

    I've found that until I've understood what my collaborators mean by data and what they (and me) are trying to get out of a dataset, it's near impossible to get anything useful done.

    To make things a bit more clear (and for my own enjoyment), I asked a select group of people a single question:

    What is data and why should we care about it?

    Those who responded are from different areas of expertise, ranging from statistics, to business, to computer science, to design. Some names you'll recognize while others will be new to you. All are doing interesting things with data.

    I've been looking forward to this series for a couple of weeks now, and my hope is that you will gain a better understanding about what data is and how people are putting it to use. Keep an eye out for posts with the black square image above.

    Here is who has answered so far:

    If you'd like to answer the question yourself, I'd love to see your response too, or if you write an answer on your own blog, please do post the link in the comments below.

  • Is My Interest in Data Obsessive?

    October 4, 2007  |  Miscellaneous

    I just saw Stranger than Fiction. The main character, Harold Crick, spends much of his life counting. He counts the number of steps it takes for him to walk from his home to the bus stop; he brushes his teeth 76 times every morning; he takes a 45.7-minute lunch break and a 4.3-minute coffee break.

    So much counting and tracking. Sounds kind of familiar. Maybe a little too familiar? Nah.

    71 words. 320 characters. Nine sentences. Wait, now ten. Eleven. Err, twelve...

  • Overgeneralizing on Chinese Takeout

    September 20, 2007  |  Miscellaneous

    Fortune CookiesMy roommate pointed out a couple of weeks ago that I always get Chinese takeout for dinner; however, we never get home at the same time, and most days, she's not even in the apartment when I arrive. How could she, a very bright and educated individual, come to such a conclusion after seeing so little data?

    In fact, by my count, she only saw me bring home Chinese takeout twice before she decided that yes, I do in fact eat Chinese every single day of the week. In reality I rotate through four choices -- sandwiches, Japanese, pizza, or Chinese with a few ventures out every now and then. This week I've had Japanese, hot dogs, Mediterranean twice, sandwich, burger, and Chinese.

    This is one of the reasons we need Statistics. What we perceive isn't always the truth. I might have had Chinese takeout on Monday and Friday, but do you know what I had on the days in between? If no, can you make an educated guess?

  • Where’s the Local Big Mac Price Data?

    August 16, 2007  |  Miscellaneous

    Big Mac meal from McDonald’sEvery now and then I indulge in a Big Mac meal from McDonald's. I feel satisfied while I eat the burger and fries and suck down my diet soda, but afterwards I feel sleepy, sluggish, and fat. Today was one of those days.

    As I ate my my satisfying-not-so-satisfying meal, I wondered what the Big Mac price differences from state to state or even city to city. I know that there's data going around about Big Mac prices in different countries, but I'm pretty sure it varies quite a bit in the U.S. alone. I don't remember paying over $6 for the number 1 in California. What a jip (and yet I've been to the golden arches at least three times in the past month).

  • Funes, thet Memorious: It’s Possible to Remember Too Much

    July 31, 2007  |  Miscellaneous

    Jorge Luis Borges wrote this really good fictional short story in 1944 called Funes, the Memorious. It's about a boy, Funes, who isn't incredibly bright until one day he falls off his horse and hits his head. After the accident Funes has finds that he suddenly has an amazing memory with which he remembers every single detail of every moment in his life.

    His memory is so vivid that at one point he sees a dog, and a moment later the dog seems different. Funes remembers the way each hair stood on the the dog's back, the direction of the breeze, what direction the dog's tail was pointed, the perspiration on his own body, where everyone else was, etc. That dog could not possibly be the same dog that he saw a moment ago.

    Funes not only remembered every leaf on every tree of every wood, but even every one of the times he had perceived or imagined it. He determined to reduce all of his past experience to some seventy thousand recollections, which he would later define numerically. Two considerations dissuaded him: the thought that the task as interminable and the thought that it was useless.

    Trying to Remember Too Much

    At this day and age, when so much of everything is stored in database and everything is logged, is it possible to remember too much? Technology has enabled us to surveil others, video tape every moment of our life, store every email, take a seemingly endless river of pictures, record conversations, and log data out the wazoo.

    Sure, it's great to have it, but what use can you make of a year's worth of data? What about ten years? Or dare I say, a century's worth of data?

    This is when visualization becomes important. It's our duty to make the ocean of data available without letting the ocean's never-ending vastness overwhelm the data explorers. Otherwise, our technological memory becomes like that of Funes', and all is lost. OK, cue the dramatic music... now.

  • Evaluating New York Subway Report Card

    July 26, 2007  |  Miscellaneous, The Times

    I had a chance to browse through some of my subscribed feeds today, and I saw a post called Noisy Subways by Kaiser over at Junk Charts blog. So I clicked, since it isn't one of those full feeds, and then I saw The New York subway report card. I smiled, because, well, I made that chart just a few days ago!

    Just a disclaimer: The Times chart was just The New York Times version of the original Straphangers report:

    Straphanger Subway Report Card

    Anyways, there was bit of a discussion, which again, I found very amusing. I felt kind of special in a way.

    There were two main points to the post - 1. Noisy data; and 2. Chart is hard to read. I'm very tired right now, so I'll just say a few things.

    Yes, the data is really noisy, but why shouldn't it be? We shouldn't assume that all six variables are positively correlated. It's very possible for a line to be very reliable, but have no seats. One could argue that the lines with more people HAVE to be more reliable, because if something goes wrong, more people are going to get screwed.

    Secondly - sure, the chart is a bit hard to read at a glance, but who's the audience? New Yorkers are the audience, and the first thing that they're going to do is look for their subway line. That's what I did. With the audience in mind, I think the chart serves its purpose.

    Most of the commenters provided decent ideas for alternative graphics. My opinion is that with this kind of data, it's up for grabs. Audience is key though for charts, graphs, plots, maps, etc in a newspaper. Spiders and whiskers won't make sense to many people. You'd be amazed of how many people don't know how to read a scatter plot. The public is getting better though. They'll get there.

    As for the person who left the comment about the gaps in the chart. I'm going to assume that was in haste. Some lines are tied, hence some blanks spaces.

    Welp, that was fun. Yawwwwn. Time for bed.

  • Difficulty Keeping Up with the Feeds

    July 24, 2007  |  Miscellaneous

    Google Reader Trends

    This is just really amusing to me. Above is a bar plot, from Google Reader, of the number of items I've read in the past 30 days, with each bar representing a day. Quite easy to see when I had a little bit too much time on my hands. Right when the internship starts, the number of items read plummets. I miss my subscribed feeds =(.

  • New Lessons Every Day

    July 20, 2007  |  Miscellaneous, The Times

    steps1

    Every day I learn a lot, and every day I get better. For most of the day today, I worked on a single graphic (that hopefully runs in the paper). I gave it to the person in charged, and oh man, there was a lot to change. Fonts, labels, fill colors, bar widths, spacing, layer orientation, size... on and on and on. I think it might have been faster for him to make the graphic himself than it was for him to fix mine.

    Sigh. Gotta practice.

    The graphic above is the number of daily steps I've taken since I started wearing a pedometer. Can you tell when I moved to the city and was forced to walk to the subway and work?

  • Motivation to Change Behavior

    June 28, 2007  |  Miscellaneous

    My mom recently, um, as in yesterday, got in a car accident. She was making a left turn at a light, and someone coming from the opposite direction decided to run a red light, sending my mom's car in a 90-degree turn. Fortunately, my mom only suffered minor burns from the airbag deployment; however, the car was totaled. The first thing that my mom did today -- the day after this major accident -- she went to work.

    This got me to thinking, what is enough to motivate someone to change her behavior? For some, when something really drastic happens, like a car accident, they gain a new outlook on life and vow to "live life to the fullest" or "value every moment". Then there are others, like my mom, who move along, because all they want is for their lives to be normal again.

    I wish I knew where to look for related research, but a quick search on Google Scholar didn't give me a whole lot.

    Let's see here... what motivates people to change their behaviors?

    • A significant, personal event
    • Change in surroundings
    • Coercion

    Surely, there's more. I'm going to dwell on this some more.

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.