Terry Pettijohn and Brian Jungeberg of Mercyhurst College took a very close look at the curves, um, measurements of past Playboy Playmates of the Year in relation to the state of the economy.
The New York Times announced the opening of their Developer Network a couple of days ago. It's their "API clearinghouse and community." It might seem kind of weird that a newspaper company has an API, but as many FlowingData readers know, the Times prides itself on innovation.
The Campaign Finance API is currently available:
With the Campaign Finance API, you can retrieve contribution and expenditure data based on United States Federal Election Commission filings. Campaign finance data is public and is therefore available from a variety of sources, but the developers of the Times API have distilled the data into aggregates that answer most campaign finance questions. Instead of poring over monthly filings or searching a disclosure database, you can use the Times Campaign Finance API to quickly retrieve totals for a particular candidate, see aggregates by ZIP code or state, or get details on a particular donor.
For anyone who has tried to play with FEC data, myself included, knows that this API is cool. You could get the data directly from the FEC, but it's a bit of a painstaking process. Now you don't have to sift through a bunch of reports or an awkward user interface.
The Movie Review API is next in line. After that, who knows, but it's a good step forward for The Times.
[via serial consign]
There's lots of free geographical data about what's going on at the surface of our planet. It's a different story for what going on underneath though. OneGeology aims to be the solution to that problem.
OneGeology is an international initiative of the geological surveys of the world and a flagship project of the 'International Year of Planet Earth'. Its aim is to create dynamic geological map data of the world available via the web. This will create a focus for accessing geological information for everyone.
I've never been one for the geology, but if the data (and interactive maps) were easily accessible, there certainly would be a peak in interest.
[via msnbc | Thanks, Samantha]
Whaaa? Cool beans.
43 Things is a goal-setting community where people set goals, cheer each other on, and connect with others who are trying to achieve the same thing. Even if you're not setting goals yourself, it's still interesting and often amusing to see what others have set out to do e.g. go skinny dipping, have a one night stand, and be myself.
The U.S. Census Bureau released their 2008 Statistical Abstract, the National Data Book, not too long ago (um, like in January). There are state rankings and data in 30 categories and many more sub-categories. All this data is in the form of PDFs and Excel spreadsheets, which doesn't lend much to readability, but still, it's nice to have access to all the information.
Maybe FlowingData readers can put together a giant statistical abstract all conveyed through graphics. That would be cool. Above are six data sets that I picked from the billion or so available.
Chris Harrison put together a series of Internet maps that show how cities are interconnected by router configuration. Similar to Aaron Koblin's Flight Patterns, Chris chose to map only the data, which makes an image that looks a lot like strands of silk stretched from city to city. With these maps, viewers gain a sense of connectivity in the world - and as expected the U.S. and Europe are a lot brighter than the rest.
Despite the Academy's efforts to crack down on bootlegging, its attempts haven't done a whole lot. Focus on stopping one area, like downloading, another area just grows more prolific, like Region 5 DVDs from overseas. A quick search in the right places will show you that piracy isn't going away any time soon.
I even met someone whose job it was to find people who were "seeding" films through bit torrents and to report them to police. I got the impression that it was a really tedious process and people go uncaught most of the time. I'm uh, not condoning this, but if you don't want to get caught, just make sure you stop the torrent once you've got your file.
Bootlegging on Seinfeld
Bootlegging always reminds me of the Seinfeld episode when Jerry somehow gets caught up in a bootlegging scheme:
[T]here was a kid couldn't have been more than ten years old. He was asking a street vendor if he had any other bootlegs as good as Death Blow. That's who I care about. The little kid who needs bootlegs, because his parent or guardian won't let him see the excessive violence and strong sexual content you and I take for granted.
For those interested (and I know you are), the term bootleg originates from hiding flasks of liquor in the legging of boots. Ahoy, matey.
Photo by mumelopics
In light of the MySpace photo breach (due to their negligence) a couple of months ago, I got to wondering about other recent data breaches. It turns out Attrition.org keeps a Data Loss Archive and Database that contains known data breaches since 2000. Records include date, number affected, groups involved, summaries, and links to reported stories and updates. It's surprisingly detailed and even better, it's all available for download.
The above graphic shows the 10 largest data breaches which affected millions. I thought the 800,000 records thieved from UCLA a couple of years ago (that my information was unfortunately a part of) was a lot. That's nothing compared to these.
Notice the higher frequency as we get closer to the present?
[Thanks Ryan | Welcome, Boing Boing readers]
For our Humanflows project, we used the United Nations Common Database for our demographic numbers. Anyone who has used the common database knows that it's not especially user-friendly. You have to go through a series of non-intuitive dropdown menus to get the data you want. You then have to decipher the downloaded data's CSV format. The recently released UNdata relieves a lot of these problems.
I don't think I've seen a single Rambo all the way through nor do I remember the premise of any of the movies, but I still found these kill counts amusing. Notice the near doubling of deaths each sequel. Yo, Adrian!!! Yeah, I know, wrong movie, but come on, is there really a difference?
Here's a graph showing kill counts (mostly for my own entertainment):
Mr. Rambo may have gotten more violent in the latest installment, but it looks like he also grew more modest.
In their paper Gender Differences in Mate Selection: Evidence from a Speed Dating Experiment, Fisman et al. had a bit of fun with a speed dating dataset. Here's what they found:
Women put greater weight on the intelligence and the race of partner, while men respond more to physical attractiveness. Moreover, men do not value women's intelligence or ambition when it exceeds their own. Also, we find that women exhibit a preference for men who grew up in afflÂuent neighborhoods. Finally, male selectivity is invariant to group size, while female selectivity is strongly increasing in group size.
The dataset is substantial with over 8,000 observations for answers to twenty something survey questions. With questions like How do you measure up? and What do you look for in the opposite sex?, this dataset is definitely high on human element and should be fun to play with.
[via Statistical Modeling]
FedStats - Provides access to the full range of official statistical information produced by the Federal Government, including population, eduction, crime, and health care.
MAPLight - A detailed database that brings together information on campaign contributions and votes in the California legislature. Check out the video tour.
EarthTrends - A collection of information regarding the environmental, social, and economic trends that shape our world.
Angry Employee Deletes All of Company's Data - A woman about to "lose" her job goes to the office at night and deletes 7 years' worth of data. Can we say backup, please?
In its continued efforts for absolute power over all information ever created in the world, Google will be hosting open-source scientific datasets at its research section. Here are the presentation slides from Google's Jon Trowbridge:
In the next few weeks, terabytes of data will be made available to the public. For example, all 120 terabytes of Hubble Space Telescope data is going to be online. That's kind of cool but kind of scary too. Such a large amount of data is bound to affect lots of people on many different levels.
For scientists, data will be available for deeper research. For the scientists who generated the data, their research could be placed under more critical scrutiny. Existing data applications might be eclipsed by the data giant, or it could go the other way such that the general public grows more aware of data-type things. Mashups will in turn spring up as well as more visualization, I am sure.
All of this Doesn't Matter If...
Of course, all of this depends on what data end up on the Google servers and how easily accessible the data are. Knowing Google, I don't think accessibility will be a problem. Getting data will be the super hard part. Who will be willing to contribute their data? What type of data will get contributed? Will it be the good, raw data or more cleaned and processed data? Do researchers even want to share their data with the rest of the world?
It's going to be interesting to see what goes up on Google Research in these coming weeks.
Iraq Body Count keeps track of civilian deaths by cross checking media reports and hospital, morgue, and NGO figures. Along with a widget counter that you can post on your blog or site, IBC also makes their database available for download.
Systematically extracted details about deadly incidents and the individuals killed in them are stored with every entry in the database. The minimum details always extracted are the number killed, where, and when.
The data comes in two sets -- incident reports and individuals who have lost their lives -- in the form of CSV files.
Albeit, the data is a little depressing, but still very necessary.
I love to look at how the current week's movies are doing at the box office. I'm not really sure what it is. I think it's kind of like a gauge for what good movies are out; or maybe I'm just constantly amazed by the millions of dollars that movies make; or I think it could be my addiction to numbers?
Something that always strikes me as interesting is how movies are always breaking records at the box office. So and so movie just broke the record for most money made over a single weekend or a month or a long holiday weekend or for a Thursday when there was at least 2 inches of rain and a dog skateboarded two miles.
I took a look at the 25 highest grossing American films, adjusted for inflation. I'm so tired of hearing statistics for money comparisons over time that don't adjust for inflation. Wow, gasoline prices are at an all time high. Well guess what -- so are milk, bread, burgers, televisions, light bulbs, paper, cars, and everything else on the planet. Sorry, slight tangent.
Download the Wallpaper
As an early birthday gift to you, here are my results in wallpaper form:
The movie titles are color coded for genre and the higher grossing films are in a larger font. Drama and action/adventure clearly dominate -- The hills are alive. Luke, I am your father. Phone home. I'll never go hungry again.
Surprisingly (at least to me), only 7 of the top 25 films won the Oscar for best picture and of the top 50, only 9 won best picture.
Baseball (or all sports for that matter) statistics are all over the place. You can easily find data for pretty much whatever sport and for whichever player you want at any given time. The problem is that if you want to download all of the data at once, you usually have to write a script and do some parsing. Who wants to do that? I don't.
For our humanflows visualization, we used data from the United Nations Common Database and the Migration Information Source. The great thing about these types of sources is that they are publicly available so that everyone gets to have fun with the data. The downside is that the data is accessible via a user interface that often makes it a chore to get all of the data.
Hence, to save you some time, you can now download the migration database that we used. I don't see any reason why you have to go through the whole data importing process when we already did it. Enjoy!
Disclaimer: Keep in mind that the data is from the United Nations and Migration Information Source, so you should refer to the two sites for any documentation. In a nutshell, the inflows table is from MIS and the rest is from United Nations. If you're looking for more, you might also want to check out OECD. I really wanted to use their data at the time, but was having trouble accessing it from Spain.
We all know fast food is incredibly bad for us and yet we still eat it. Why? Because it has tons of fat and tastes delicious. Nevermind that we will die a few days earlier for every French fry we eat.
Over at Calorie Counter, they try to make us feel guilty with numbers. Check out the Carl's Jr. Double Six Double Dollar Burger with 1,520 Calories and a delicious 111 grams of fat. I'm a little surprised that it beat out the Burger King Triple Whopper with cheese. I shudder just thinking about eating one of those.
Anyways, there's a whole lot of numbers here but not an incredible amount of meaning. How bad is bad? How much fat should I consume per day? Is 111 grams of fat bad? If yes, how will it directly affect me? Yes, 111 grams of fat is bad for you. You will directly feel the effects as you sit on the toilet in the morning wondering why it is taking you so long to take a dump. Now that's context.
Also, with all the numbers, I bet all the tables would benefit from some kind of chart or, at the least, a simple infographic. Any takers? We should have a contest for who can make fast food the least appealing using nutritional data and without bending the truth.