It's a given that some colleges and programs give more A's than others, but according to data collected by Stuart Rojstaczer from about 230 schools, it seems that average GPAs have been increasing overall.
Data is everywhere, but use of data is not. So many of our efforts are centered around making money or getting people to buy more things, and this is understandable; however, there are neglected areas that could actually have a huge impact on the way we live. Jake Porway, a data scientist at The New York Times, has a proposition for you, tentatively called Data Without Borders.
[T]here are lots of NGOs and non-profits out there doing wonderful things for the world, from rehabilitating criminals, to battling hunger, to providing clean drinking water. However, they’re increasingly finding themselves with more and more data about their practices, their clients, and their missions that they don’t have the resources or budgets to analyze. At the same time, the data/dev communities love hacking together weekend projects where we play with new datasets or build helpful scripts, but they usually just culminate in a blog post or some Twitter buzz. Wouldn’t it be rad if we could get these two sides together?
Yes. It would be rad. If you're an NGO looking for help or a data hacker with a desire to provide some, sign up to the mailing list, and help Jake get the ideas rolling.
Some people use a passcode on their iPhones simply to prevent their kids from mucking around with it or accidentally calling the police. Others use it for actual security reasons — because there's private information on your phone that you wouldn't want a stranger to have access to. If you're in the latter group, hopefully you use a passcode that isn't easy to guess.
Daniel Amitay, developer of the Big Brother Security Camera app (now removed from the App Store), added some code to the app to record user passcodes anonymously. Here are his findings.
A little over a week ago, Sony was hit yet again with another security breach — this time over one million passwords, that were stored in plain text, were released into the wild. Software architect Troy Hunt took a closer look at the dataset and found just how predictable people's passwords are.
The Pew Research churns out a lot of interesting results from a number of surveys about online and American culture, but they usually only shared aggregated results, pre-made charts and graphs. This is well and good for the information-consuming public; however, these results can spawn curiosities that are fun to dig into. Luckily, the Pew Research Center launched a Data Sets section that provides raw survey responses and the questions in a variety of easy-to-use data formats.
Our raw data, previously posted only as SPSS files, is now available in comma-delimited (.csv) format for all reports going back to 2003. We hope that making our data available in this open-source format will make analysis easier for researchers who don’t own a copy of SPSS to analyze our data.
This should be fun. Recent datasets include the social side of the Internet, health tracking habits, and reputation management.
The argument behind this graph in The Wall Street Journal is that the middle class has most of the money and ties into a larger argument about who should be taxed what. There is after all a spike in the middle. Is that really the case though? Sound off in the comments.
(Cheat sheet: Jonathan Chait explains what's going on and Kevin Drum improves the graph to show more truth, although his graph can be improved, too. Grab the data here [Excel spreadsheet] from the IRS, and give it a go.)
For the statistical nerd in you or for the child you are raising as one, Nausicaa Distribution on Etsy sells handmade gifts inspired by statistical distributions. Above shows the dastardly gang of five evil distribution plushies: Weibull, Cauchy, Poisson, Gumbel, and Erlang. Judging by their moustaches, you better watch out when they're around.
OkCupid adds another report to their growing list of analyses on relationships. This time around, they look at sex and how ideas vary by demographic. The above graph shows per capita GDP versus portion of people looking for casual sex.
We were amazed at this result—money seems to be a more powerful influence on sex drive than culture or even religion.
You have, for example, Portugal, Oman, Slovenia, and Taiwan within a few pixels of each other on the right side of the graph, and Syria, Sri Lanka, and Guatemala almost stacked on the left, and all of them sit along the trend line.
Interesting as usual. What amazes me more is that so many people answer such private questions. Have any of you tried OkCupid? Are these questions part of the matching process?
See OkCupid for more findings on sex such as drive and body type and Twitter usage and commitment.
Researchers Alasdair Allan and Pete Warden have found that the iPhone records cell tower access, and hence your location, in an easy-to-read file that is transferred as you switch devices. And they do this whether you like it or not.
The more fundamental problem is that Apple are collecting this information at all. Cell-phone providers collect similar data almost inevitably as part of their operations, but it’s kept behind their firewall. It normally requires a court order to gain access to it, whereas this is available to anyone who can get their hands on your phone or computer.
Allan and Warden provide an open-source application, iPhone Tracker, that maps that data. The good news is that the data doesn't seem go to be anywhere other than your own backups and devices. Privacy concerns aside, this kind of makes me wish I had an iPhone; although I suspect my map would be painfully boring.
Over the past four years there was a 43 percent increase in prescriptions for antidepressants. Some news outlets attribute this rise to the recession. People more depressed equals more drugs. Ben Goldacre of Bad Science explains why said outlets need to be more careful with their analyses.
From what I can tell, all the reports took an aggregate (the 43 percent) and then made a big assumption to explain it. I'm all for data journalism, but statistics is rarely that straightforward.
In a guest post for the guardian.co.uk Datablog, I thought out loud about the possible end of Data.gov and what it means for open government data. Let me know what you think.
Update: Funding might not be cut completely (for now).
Last week, there were rumblings over the end of the Statistical Abstract, and I suggested that it was just a sign of changing technologies. I thought that Data.gov and similar sites were the natural progression. Here's the problem with that argument. Congress is planning on shutting down Data.gov and other transparency sites in the next few months.
Not many people understand the importance of data privacy. They don't get out how little bits of information sent from your phone every now and then can show a lot about your day-to-day life.
As the German government tries to come to a consensus about its data retention rules, Green party politician Malte Spitz retrieved six months of phone data from Deutsche Telekom (by suing them), to show what you can get from a little bit of private mobile data. He handed the data to Zeit Online, and they in turn mapped and animated practically every one of Spitz' moves over half a year and combined it with publicly available information from sources such as his appointment website, blog, and Twitter feed for more context.
“The problem isn’t that specialised companies lack the data they need, it’s that they don’t go and look for it, they don’t understand how to handle it.”—Hans Rosling, A Data State of Mind, March 2011
Google UK produced a short book called Think Quarterly to distribute to partners and advertisers, but it's actually pretty interesting for a more general audience. Articles feature Hans Rosling, Hal Varian, and others. Also a hat tip to FlowingData in Simon Rogers' list of sexy resources.
The Web is a game of pageviews, and outlets such as Twitter and Facebook are a way to rack up the counts. The more people who share your posts and articles, the more new people that visit your site. So what kind of articles are shared more often? How do people with interact with these articles? Yahoo! research scientist Yury Lifshits digs into Facebook likes for some ideas, using data collected from 45 sites, 100k+ articles, and 40 million reactions, between October 2010 and January 2011.
Hot off the MIT Sloan Sports Analytics Conference, Sean Gregory argues for more "stats geeks" on the sidelines and in the huddle during the game.
[S]itting next to your team's manager, a scruffy baseball lifer, in the dugout is not just another scruffy baseball lifer, spitting tobacco. Instead, by his side is a guy with a Ph.D. in theoretical physics, a beautiful mind who can calculate complex probabilities, in real time, in his head. He can tell you the odds of so-and-so throwing such-and-such a pitch to so-and-so on such-and-such a count.
It's a fluffy article with not much on what the stat person would actually do, so you'll have to imagine. Honestly, I hope sports statistics doesn't come to that though. Unpredictability is what makes games so fun to watch.
Computers mimic human reasoning by building on simple rules and statistical averages. Test your strategy against the computer in this rock-paper-scissors game illustrating basic artificial intelligence. Choose from two different modes: novice, where the computer learns to play from scratch, and veteran, where the computer pits over 200,000 rounds of previous experience against you.
Be sure to play at least five rounds, and then click on the button to see what the computer is thinking. In veteran mode, the computer searches its database for sequences that match your last five moves and its last five moves and then tries to predict what you'll throw next.
Are you good enough to beat the basic artificial intelligence?
The government has been making a big push for more open health-related data, and a couple of weeks ago, they released a whole bunch of it with the launch of HealthData.gov. It's the same interface as Data.gov, but for health. Additionally, the Health Indicators Warehouse launched with different data and a slightly more useable interface.
A quick scan of the data available, however, does seem to indicate that a lot of it is spotty or outdated (like on data.gov), which doesn't make it especially useful. For example, some data sets are only one data point, while others are only a single year. At least it's a start.
William Briggs and John Briggs examine the differences between movies that have won Best Picture and those that were top at the Box Office, based on money, gender, age, and genre. "There was only one Oscar winning movie with a leading actress older than 50: Jessica Tandy in Driving Miss Daisy. Eight women were at least 40 in Oscar winning movies, e.g. Myrna Loy, Bette Davis, Sandra Bullock. However, half of these were just 40 or 41."
Need music data? Get all the data you want and more from the freely available million song dataset, offered by LabROSA at Columbia University and Echo Nest. There's lots of metadata on song features and your standard stuff like year and artist. There are also several code wrappers and samples to help researchers make use of the data right away.