For the downtime post-turkey. James Trimble stuck the top 200 reddits of all time into a treemap. Let the time suck begin.
For the downtime post-turkey. James Trimble stuck the top 200 reddits of all time into a treemap. Let the time suck begin.
Presented mostly for my fond memories as a grade schooler, with a fresh 2400 bps modem in the 486, who recently discovered something called a BBS. Those were the good old days. My dad got me a 50-foot phone line to run from the computer to the phone jack in the back corner of another room.
Dan Delany took a simple look at furloughed employees due to the government shutdown. There are tickers for duration, estimated unpaid salary, and estimated food vouchers unpaid, but the main view is the interactive tree map that shows furloughed proportions by department.
Weddings are special events where friends and family come together to celebrate, and we encapsulate them in their special day. What if you looked at weddings over time though? Todd Schneider provides a view into wedding announcements in The New York Times in Wedding Crunchers, and although the announcements are mostly New York-based, you get a peek into events and social trends. Simply enter terms or phrases and see the trends over time.
Be sure to check out Schneider's detailed description and highlights of the data. [Thanks, Todd]
Forecast, one of the best if not the best quick-look weather sites, uses various weather models to predict temperature, wind, humidity, and pressure. Whereas the main result is an estimated map view along with highs and lows for the week, Forecast Lines shows you the the weather models that drive the site.
Forecast works by statistically aggregating a number of different weather models into a single forecast. Because I can peek under the hood, I was able to take a look at all the raw models and see how many dipped below freezing. I saw that none of them did, which gave me confidence that my plants would be okay.
Today we’re launching a new weather app that lets everyone “peek under the hood.” We’re calling it Forecast Lines.
And like the main Forecast site, it works fine and dandy on your iPad or mobile device.
Allison McCann for Businessweek graphed rappers' claimed wealth in their songs versus their actual wealth.
Fresh off of Jay-Z's new album is the track Versus, on which he chides fellow hip-hop artists and their dubious tales of extraordinary wealth: "The truth in my verses, versus, your metaphors about what your net worth is." Like Jay-Z, we’ve long been skeptical of just how wealthy some hip-hop stars claim to be, so we created a way to separate the truly rich from the loud-mouth lyricists.
As you'd expect, some rappers tend to exaggerate. Speaking of which, this seems like a good time to revisit the map that shows the area codes where Ludacris claims to have hoes. Unfortunately, there is no data to verify or debunk.
Data from an experiment may appear rock solid. Upon further examination, the data may morph into something much less firm. A knee-jerk reaction to this conundrum may be to try and hide uncertain scientific results, which are unloved fellow travelers of science. After all, words can afford ambiguity, but with visuals, "we are damned to be concrete," says Bang Wong, who is the creative director of the Broad Institute of MIT and Harvard. The alternative is to face the ambiguity head-on through visual means.
I still struggle with uncertainty and visualization. I haven't seen many worthwhile solutions other than the old standbys, boxplots and histograms, which show distributions. But how many people understand spread, skew, etc? It's a small proportion, which poses an interesting challenge.
Damien Hirst is an artist known for a number of works, one of those being his large production of spot paintings. There are over a thousand of them painted by him and his assistants, varying in size, number of dots, density, and color. Amanda Cox of The New York Times plotted paintings sold from 1999 to present, topping out at $3.4 million. That's a whole lot of dottage.
In probability theory, the central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed.
Victor Powell animated said random variables falling into a normal distribution (which should look familiar to those who have seen that ping pong ball exhibit in exploratoriums and science museums). Play around with the number of bins and delay time and watch it go.
The quality of television shows follow all kinds of patterns. Some shows stink in the beginning and slowly gain steam, whereas others are great at first and then lost momentum towards eventual cancellation. Using data from the Global Episode Opinion Survey, Andrew Clark visualized ratings over time for many popular shows in an interactive.
Stephen Wolfram analyzed the Facebook world, based on anonymized data from the Wolfram|Alpha Data Donor program. He visits topics from how people friend, how the Facebook world compares to the real one, and how people change with age.
People talk less about video games as they get older, and more about politics and the weather. Men typically talk more about sports and technology than women—and, somewhat surprisingly to me, they also talk more about movies, television and music. Women talk more about pets+animals, family+friends, relationships—and, at least after they reach child-bearing years, health. The peak time for anyone to talk about school+university is (not surprisingly) around age 20. People get less interested in talking about "special occasions" (mostly birthdays) through their teens, but gradually gain interest later. And people get progressively more interested in talking about career+money in their 20s. And so on. And so on.
Worth the full read.
If you've ever looked at ticket prices for sporting events, you probably noticed the disparity in prices of when your team plays a popular team or a rival versus a less than stellar team. Last time I looked a ticket to watch the Golden State Warriors play the Lakers or Heat was twice as much as when they played the Kings. David Yanofsky for Quartz noted the same pricing strategy in baseball.
The heat map above shows the effect of visiting teams on ticket prices. As you'd expect (if you follow baseball even just a tiny bit), price goes up significantly when the New York Yankees come to town. In contrast, the price goes down when the Seattle Mariners show up.
There's clearly a supply and demand thing going on here. Nobody wants to see bad teams play. But now it's time to pull a Billy Beane. How little can you spend on a team and a stadium and still make a profit? [Thanks, David]
FlowingData reader Amir sent this along. In lieu of a list of coffee drinks, this place in in East London opted for ingredient breakdowns. I'm guessing there's a standard menu outside the frame, because otherwise, coffee neophytes (like me) would have no clue what to do. Anyone care to fill in the blanks?
Spot any charts in the wild? You should email me a picture.
Upon discovering hundreds of thousands open embedded devices on the Internet, an anonymous researcher conducted a Census of the Internet, mapping 460 million IP addresses around the world.
While playing around with the Nmap Scripting Engine (NSE) we discovered an amazing number of open embedded devices on the Internet. Many of them are based on Linux and allow login to standard BusyBox with empty or default credentials. We used these devices to build a distributed port scanner to scan all IPv4 addresses. These scans include service probes for the most common ports, ICMP ping, reverse DNS and SYN scans. We analyzed some of the data to get an estimation of the IP address usage.
It's a pretty thorough analysis, but the conclusion interested me most:
The why is also simple: I did not want to ask myself for the rest of my life how much fun it could have been or if the infrastructure I imagined in my head would have worked as expected. I saw the chance to really work on an Internet scale, command hundred thousands of devices with a click of my mouse, portscan and map the whole Internet in a way nobody had done before, basically have fun with computers and the Internet in a way very few people ever will. I decided it would be worth my time.
It makes me feel...uneasy. [Thanks, Roger]
Who's going to be the next pope? I know all of you are sitting on the edge of your seats. Luckily, an analytical research manager who goes by the name AJ hacked together a pope tracker.
Despite not being Catholic, the papal election fascinates me. Not sure if it’s the old rituals, the world-wide interest, or simply the fact that the Catholic Church has left a huge mark on history.
There’s no way I know enough about the inner workings of the Catholic Church to have any idea on who the next Pope may be.
Since domain knowledge is out, the next best option?
Follow the money!
He's scraping odds of possible candidates becoming pope from a betting site, and the above shows the numbers over time. The odds were bumpy at first, but there seems to be some convergence, and as of this writing, Cardinal Peter Turkson from Ghana is the heavy favorite. [via Revolutions]
With the State of the Union address tonight, The Guardian plotted the Flesh-Kincaid grade levels for past addresses. Each circle represents a state of the union and is sized by the number of words used. Color is used to provide separation between presidents. For example, Obama's state of the union last year was around the eighth-grade level, and in contrast, James Madison's 1815 address had a reading level of 25.3.
My guess is this has to do with changes in how we write and talk more than anything else. Lee Drutman and Dan Drinkard for the Sunlight Foundation ran a more rigorous analysis on Congressional records back in May, and the declining trend is similar.
Ritchie King for Quartz compared money spent on Super Bowl ads — now about $3.75 million for a 30-second spot — to how much the companies make on average in 3 and a half hours (the average length of a game).
It's impossible to say exactly how much a successful Super Bowl ad ultimately earns a company. Surely the Wassup commercials were a huge boon for the Budweiser brand—but how huge?
One thing is clear though: for the biggest advertisers, that $3.75 million is truly a pittance. In fact, some of them make almost as much in profits in an average 3.5 hours—roughly the time it takes to air the Super Bowl itself.
Note that spending (on the bottom) is total between 2002 and 2011, and the vertical scales are different (so it probably would've been good to give more visual separation between the two charts), but still, kind of an interesting perspective.
Carlos Scheidegger and Kenny Shirley, along with Chris Volinsky, visualized Major League Baseball Hall of Fame voting, from the first class in 1936 (which included Babe Ruth) up to present.
All a fan can do is accept that Baseball Hall of Fame voting, conducted by the Baseball Writers Association of America (BBWAA), is a phenomenon unto itself. If we can't understand baseball Hall of Fame voting, though, maybe the next best thing is visualizing the data behind it. The set of interactive plots on this webpage is our attempt to do that. We were especially interested in two things: (1) viewing the trajectories of BBWAA vote percentage by year for different players throughout history, and (2) simultaneously viewing the career statistics of these players, to help find patterns and explain their trajectories (or to reassure ourselves that the writers really are crazy).
The interactive is on the analysis side of the spectrum, so you might be a bit lost if you don't know a lick about baseball. However, if your're a baseball fan, there's a lot to play around with and dimensions to poke around at, as you can filter on pretty much all player stats such as home run count, batting average, and innings played. At the very least, you're getting a peek at how statisticians pick and prod at their data.
Start at the examples section for quick direction. I eventually found myself looking for downward trajectories. Poor Mark McGwire. [Thanks, Chris]
Jeff Clark took a detailed look at Victor Hugo's Les Miserables via character mentions, word connections, and word usage. The above is character mentions with color showing sentiment. Red means negative, and blue positive.
Characters are listed from top to bottom in their order of appearance. The horizontal space is segmented into the 5 volumes of the novel. Each volume is subdivided further with a faint line indicating the various books and, finally, small rectangles indicate the chapters within the books. In the 5 volumes there are a total of 48 books and 365 chapters. The height of the small rectangles indicate how frequently that character is mentioned in that particular chapter.
There's a good amount of blue towards the end, when everyone decides everyone else isn't so bad.
See the full version and other views here.
I made a graphic a while back that showed traffic fatalities over a year. John Nelson extended on that, pulling five years of data and subsetting by some factors: alcohol, weather, and if a pedestrian was involved. And he aggregated by time of day and day of week instead of calendar dates.