Predictify takes James Surowiecki's The Wisdom of Crowds to heart. Surowiecki argues that when certain factors are present (for example, group diversity), then the group is always smarter than the individual. Predictify has turned this "principle" into a money-making platform.
In response to my question, "What is data and why should we care about it?" - Zach Gemignani from Juice Analytics answered:
Obi-Wan Kenobi could have been speaking about data in businesses when he said: "It's an energy field created by all living things. It surrounds us, and penetrates us. It binds the galaxy together."
Data is the residue of every action and interaction that takes place in a company, with customers, and in the marketplace. Businesses have created complicated and effective nets to capture this data as it flies off in all directions. Unfortunately, mountains of data mean nothing. Like young Luke Skywalker's inability to control The Force, a company's inability to make use of data is nothing more than frustration and untapped potential.
Making use of data takes a subtle combination of capabilities. It takes experience and context about the business, speed and skill to manipulate data, and an ability to visualize and communicate results. Data in the wrong hands is useless if not dangerous; in the right hands data can transform into new insights and informed decisions.
I've been fortunate to have worked with people from lots of different fields - statistics, ecology, computer science, engineering, design, etc. If I've learned anything, it's that everyone has a different idea of what data is and why it matters.
I've found that until I've understood what my collaborators mean by data and what they (and me) are trying to get out of a dataset, it's near impossible to get anything useful done.
To make things a bit more clear (and for my own enjoyment), I asked a select group of people a single question:
What is data and why should we care about it?
Those who responded are from different areas of expertise, ranging from statistics, to business, to computer science, to design. Some names you'll recognize while others will be new to you. All are doing interesting things with data.
I've been looking forward to this series for a couple of weeks now, and my hope is that you will gain a better understanding about what data is and how people are putting it to use. Keep an eye out for posts with the black square image above.
Here is who has answered so far:
- May the Data Be With You, Young Skywalker by Zach Gemignani
- Data Makes Reasonable Decision-making Possible by Andrew Gelman
- Increasing Data Literacy Across the General Public With Truth and Beauty by Matt Hurst
- Showing Historical & Cultural Connections and Mapping Influence by Mike Love
- Understanding Data, Not Just the Realm of Scientists in Ivory Towers by Hadley Wickham
- A Lesson in Recycling Chartjunk as Junk Art by Kaiser Fung
If you'd like to answer the question yourself, I'd love to see your response too, or if you write an answer on your own blog, please do post the link in the comments below.
I just put down $20 on today's game for the New York Giants to cover the 12-point spread. Of course, knowing me, I got to thinking how that betting line is decided. Is there one person who calculates the spread? Do Las Vegas casinos just put up numbers based on past experiences? I did a little bit of research, and here's what I found.
FedStats - Provides access to the full range of official statistical information produced by the Federal Government, including population, eduction, crime, and health care.
MAPLight - A detailed database that brings together information on campaign contributions and votes in the California legislature. Check out the video tour.
EarthTrends - A collection of information regarding the environmental, social, and economic trends that shape our world.
Angry Employee Deletes All of Company's Data - A woman about to "lose" her job goes to the office at night and deletes 7 years' worth of data. Can we say backup, please?
Peter Donnelly talks about the misuse of statistics in his TED talk a couple of years back. The first 2/3 of the talk is an introduction to probability and its role in genetics, which admittedly, didn't get much of my interest. The last third, however, gets a lot more interesting.
Donnelly talks about a British woman who was wrongly convicted largely in part because of a misuse of statistics. A so-called expert cited how improbable it would be for two children to die of sudden infant death syndrome, but it turns out that "expert" was making incorrect assumptions about the data. This doesn't surprise me since it happens all the time.
People misuse statistics every day (intentionally and unintentionally), and oftentimes it doesn't hurt much (which doesn't make it any better), but in this case improper use directly affected someone's life in a very big way. One of the most common assumptions I see is that every observation is independent, which often is not the case. As a simple example, if it's raining today, does that change the probability that it will rain tomorrow? What it didn't rain today?
In other words, the next time you're thinking of making up or tweaking data, don't; and the next time you need to analyze some data but aren't sure how, ask for some help. Statisticians are nice and oh so awesome.
Here's Donnelly's talk:
The National Science Foundation is running their annual Science and Engineering Visualization Challenge.
Some of scienceâ€™s most powerful statements are not made in words. From the diagrams of DaVinci to Hookeâ€™s microscopic bestiary, the beaks of Darwinâ€™s finches, Rosalind Franklinâ€™s x-rays or the latest photographic marvels retrieved from the remotest galactic outback, visualization of research has a long and literally illustrious history. To illustrate is, etymologically and actually, to enlighten.
You can do science without graphics. But itâ€™s very difficult to communicate it in the absence of pictures. Indeed, some insights can only be made widely comprehensible as images. How many people would have heard of fractal geometry or the double helix or solar flares or synaptic morphology or the cosmic microwave background, if they had been described solely in words?
To the general public, whose support sustains the global research enterprise, these and scores of other indispensable concepts exist chiefly as images. They become part of the essential iconic lexicon. And they serve as a source of excitement and motivation for the next generation of researchers.
They've been accepting submissions since September of last year and will continue to do so until May 31, 2008. The rules are pretty wide open with last year's winners in the area of photography, illustration, and interactive and non-interactive media. Basically, it's whatever you want it to be. The winners will be published in the the journal Science, and one of the winning submissions will get to be on the cover of the prestigious journal.
Whenever I tell people that I study Statistics, they almost always respond, "So what do you do with that?" After they get over their initial shock, I often get, "If I were in Statistics, I'd study sports statistics." I usually respond by telling them that while it would probably be a lot of fun, I don't think there is much money in it (because I gotta eat, right?) and that statisticians usually take that as a part time gig. I'm thinking I might have to change that response though, as the game of sports statistics is showing signs of life with the recent Journal of Quantitative Analysis in Sports.
Articles in the Journal of Quantitative Analysis in Sports (JQAS) come from a wide variety of sports and perspectives and deal with such subjects as tournament structure, frequency and occurrence of records and the optimal focus of training for decathlons. Additionally, the journal serves as an outlet for professionals in the sports world to raise issues and ask questions that relate to quantitative sports analysis. Edited by economist Benjamin Alamar, articles come from a diverse set of disciplines including statistics, operations research, economics, psychology, sports management and business.
Maybe I'll read regularly and take up sports betting as my new hobby.
Jared Pool had a chat with Andrew (multimedia) and Steve (graphics) at The New York Times. I'm sure you're familiar with their work. They chat about the design process of the interactive pieces on The Times site like the transcript analyzer, the home run chart, and plenty of other specific examples. They also go into a bit about where they get inspiration from (e.g. old Fortune magazines, photographs, advertisements) as well as how they go about creating their more innovative pieces.
Keep in mind it's on the User Interface Engineering blog, so it's mostly focused on, well, the user interaction and design and less on where data comes from, the journalistic process, etc, but still, it's a pretty good listen.
[via Visual Methods]
For the re-launch of the Microsoft Windows Live platform, Firstborn created a generative art installation taking thousands of smiling faces and placing them into a 3-D world. It was an outdoor installation (done in Processing) projected on a seven-story sphere, and I am sure it wowed a whole lot of people. It's definitely amazing me, and all I'm seeing are screenshots and a demo.
Interactive Travel Time and House Price Maps - Tom from Stamen recently announced some really slick mapping. They're very attractive and very responsive. Sidenote: Look forward to a guest post from Tom in the near future.
175+ Data and Information Visualization Examples and Resources - Meryl has posted an extensive list of visualization examples and resources available online. Thanks for linking here, Meryl!
GPSed - A site that takes advantage of the data available from your mobile phone, mainly pictures and your GPS trace.
Visualizing the History of Living Spaces - Ivanov et al. discuss the challenges of visualizing motion data from 215 motions sensors in a large office building.
Virgil Griffith has created a series of graphs called Books that Make You Dumb. He correlates top books on FaceBook by school and the corresponding schools' average SAT scores. Notice Freakonomics is pretty far to the right. Nice.
The graphs are of course aren't really that statistical nor are they especially beautiful, but hey, just take it for what is it, and it's kind of amusing. Plus, it's a good example of how you can use data from different sources to find something interesting.
The thing about data is that it can be very convincing. Maybe it's because it's so hard to argue against numbers, or maybe it's just that there's so much of it. In any case, here's six datasets that undoubtedly changed the way some people behave or showed us something that brought about a different way of thinking about things. Continue Reading
Those of you who have been around since the beginning know that I am just obsessed with my pedometer. Albeit, lately, I haven't felt inclined to go for a winter stroll in the below freezing weather. When I was keeping track of my steps though, one of the difficulties was staying consistent. Sometimes I would forget to wear my pedometer, while other times I would forget to record my steps.
I imagine Walker Tracker could help a bit in solving that second problem. I know it was always easier to make it to the gym when I knew one of my friends was going to meet me there. Walker Tracker is like that friend at the gym. The site lets you keep track of your steps as well as see how others are doing.
We're trying to change the world. We're trying to get you and us and everyone we know off the elevator and out of the car and onto the sidewalks and trails. We're doing it one step at a time.
Get up, stand up and walk.
OK, maybe it's a little hoorah, but if you feel like actually accomplishing a new year's resolution this year, Walker Tracker could be a good place to start.
[via Web Worker Daily]
Warning: Tangent ahead, but I promise, there's a point.
About a year ago, I went to my 6-month teeth checkup, and the dentist told me that I had a cavity on the bottom back left and another on the bottom back right. Since I was about two years overdue for a checkup (and didn't floss every day), I wasn't surprised.
One week later, I was back to get my fillings. I sat down in that terrifying chair that looks like something aliens use to probe specimens. The drilling began.
My teeth are really sensitive, so no matter how many shots of novocaine she injected (3 or 4), I still felt pain. Here's how it went with the first filling. She drilled. I winced. She stopped. We took a short 1-minute break. She drilled. I winced. We took a break.
We went on like that for about 20 minutes -- all the while she kept telling me it was a tiny cavity and that it shouldn't hurt. Yeah, OK, whatever. Maybe if she actually stuck the needle in the nerve and not just some random place in my gums, it would have worked.
Anyways, she finally finished and suggested we put off the second filling until the next visit in six months. I thought to myself, "Uh, won't my cavity just get worse in 6 months??" I was in enough pain already though (with beads of sweat to prove it) so I agreed despite my concerns.
I ended up missing that next appointment.
In its continued efforts for absolute power over all information ever created in the world, Google will be hosting open-source scientific datasets at its research section. Here are the presentation slides from Google's Jon Trowbridge:
In the next few weeks, terabytes of data will be made available to the public. For example, all 120 terabytes of Hubble Space Telescope data is going to be online. That's kind of cool but kind of scary too. Such a large amount of data is bound to affect lots of people on many different levels.
For scientists, data will be available for deeper research. For the scientists who generated the data, their research could be placed under more critical scrutiny. Existing data applications might be eclipsed by the data giant, or it could go the other way such that the general public grows more aware of data-type things. Mashups will in turn spring up as well as more visualization, I am sure.
All of this Doesn't Matter If...
Of course, all of this depends on what data end up on the Google servers and how easily accessible the data are. Knowing Google, I don't think accessibility will be a problem. Getting data will be the super hard part. Who will be willing to contribute their data? What type of data will get contributed? Will it be the good, raw data or more cleaned and processed data? Do researchers even want to share their data with the rest of the world?
It's going to be interesting to see what goes up on Google Research in these coming weeks.
There's a nice real-time (?) map on (suit)men Entertainment. Click the black rectangle on the bottom left-hand corner to see the entire map. Supposedly the map is powered by Google, so I want to say it's showing search data or something of that sort. To be honest though, I have no clue.
Whenever a number pops up, there's a line that connects some country to Japan (the site's origin), so I'm guessing they're mapping something like accesses to the (suit)men site from whatever country. Oh well, no matter. Look how pretty. It's entertainment, and it managed to entertain me for a good few minutes (which says alot with my short attention span :). Does anyone know what they're showing?
[via Simple Complexity]
Iraq Body Count keeps track of civilian deaths by cross checking media reports and hospital, morgue, and NGO figures. Along with a widget counter that you can post on your blog or site, IBC also makes their database available for download.
Systematically extracted details about deadly incidents and the individuals killed in them are stored with every entry in the database. The minimum details always extracted are the number killed, where, and when.
The data comes in two sets -- incident reports and individuals who have lost their lives -- in the form of CSV files.
Albeit, the data is a little depressing, but still very necessary.
You've probably already noticed (unless you're subscribed to the feed), but FlowingData now has a brand new look and feel. It started with a tweak, and then I just got carried away. I think it took a turn for the best though. Some of the changes include a new logo, featured articles, and more focus on visualization. Continue Reading
Jeffrey Heer et al. writes in Design Considerations for Collaborative Visual Analytics about a couple of models for social visualization -- information visualization reference model and the sensemaking model. The former is a simpler, more straightforward model starting with raw data -> processed data -> visual structures -> actual visualization; while the latter is a bit more complicated with similar stages but with feedback loops. My main reflections weren't so much with the ideas proposed by the paper. Rather, I'm more interested in what was not mentioned -- not only in this paper but in other social data analysis papers.