Almost a year and a half ago, Infochimps, the data repository slash marketplace, released a giant scrape of Twitter data representing 2.7 million users, 10 million tweets, and 58 million connections. Twitter soon requested that they take it down while they figured out how they wanted to handle licensing, privacy, etc.
That was in 2008, before Twitter really started booming. Fast forward to now. Twitter and Infochimps have figured out what they want to do, and the Twitter census data is back up. It’s no longer a measly 2.7 million users anymore though. The population has grown to 35 million.
This time around, instead of one big data dump, Infochimps provides large datasets for several metrics. Some are free. Some are not. Since there’s no easy way to split up free from non-free or sort by price on Infochimps, I’ve saved you the trouble and separated it for you.
Here are the free ones:
- Conversation Metrics: One year of URLs, Hashtags, Smileys usage (Smiley Counts)
- Twitter Users by Background Color
- Twitter Users by Friends Count
- Twitter Users by Followers Count
- Twitter Users by Month Added
- Twitter Users by Day Added
- Twitter Users by Hour Added
- Tweets by Month Tweeted
- Tweets by Day Tweeted
- Twitter Users by Location
- Smileys
- Tweets by Hour Tweeted
These will cost you, ranging from $20 to all the way up to $800. Generally speaking, the free data is a subset of the paid data.
- Conversation Metrics: One year of URLs, Hashtags, Smileys usage (monthly)
- Developer Tools – Mapping from Twitter User Search ID to Twitter API IDs
- Conversation Metrics: One year of URLs, Hashtags, Smileys usage (by Hour)
- Hashtags, URLs, Smileys by Month
- Stock Tweets
- Hashtags, URLs, Smileys by Day
- Hashtags, URLs, Smileys by Hour
- Trst Rank
So there you go. No more wasting time trying to get crafty with the Twitter API limits. It’s all there at your disposal. Now what are you going to do with it?
Very interesting stuff! The one thing I’m really interested in learning about Twitter users is browser stats: what browser they’re using, what their screen resolutions are, etc.. Is this data out there anywhere?