• Central limit theorem animation

    June 3, 2013  |  Statistical Visualization

    Central limit theorem animation

    The central limit theorem:

    In probability theory, the central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed.

    Victor Powell animated said random variables falling into a normal distribution (which should look familiar to those who have seen that ping pong ball exhibit in exploratoriums and science museums). Play around with the number of bins and delay time and watch it go.

  • Ratings of TV shows over time

    May 16, 2013  |  Statistical Visualization

    Show ratings for 24

    The quality of television shows follow all kinds of patterns. Some shows stink in the beginning and slowly gain steam, whereas others are great at first and then lost momentum towards eventual cancellation. Using data from the Global Episode Opinion Survey, Andrew Clark visualized ratings over time for many popular shows in an interactive.
    Continue Reading

  • A thorough Facebook analysis by Stephen Wolfram

    April 25, 2013  |  Statistical Visualization

    Facebook networks

    Stephen Wolfram analyzed the Facebook world, based on anonymized data from the Wolfram|Alpha Data Donor program. He visits topics from how people friend, how the Facebook world compares to the real one, and how people change with age.

    People talk less about video games as they get older, and more about politics and the weather. Men typically talk more about sports and technology than women—and, somewhat surprisingly to me, they also talk more about movies, television and music. Women talk more about pets+animals, family+friends, relationships—and, at least after they reach child-bearing years, health. The peak time for anyone to talk about school+university is (not surprisingly) around age 20. People get less interested in talking about "special occasions" (mostly birthdays) through their teens, but gradually gain interest later. And people get progressively more interested in talking about career+money in their 20s. And so on. And so on.

    Worth the full read.

  • Analysis of baseball ticket pricing

    April 10, 2013  |  Statistical Visualization

    Baseball ticket pricing

    If you've ever looked at ticket prices for sporting events, you probably noticed the disparity in prices of when your team plays a popular team or a rival versus a less than stellar team. Last time I looked a ticket to watch the Golden State Warriors play the Lakers or Heat was twice as much as when they played the Kings. David Yanofsky for Quartz noted the same pricing strategy in baseball.

    The heat map above shows the effect of visiting teams on ticket prices. As you'd expect (if you follow baseball even just a tiny bit), price goes up significantly when the New York Yankees come to town. In contrast, the price goes down when the Seattle Mariners show up.

    There's clearly a supply and demand thing going on here. Nobody wants to see bad teams play. But now it's time to pull a Billy Beane. How little can you spend on a team and a stadium and still make a profit? [Thanks, David]

  • Chartspotting: Coffee graph menu

    March 29, 2013  |  Statistical Visualization

    Coffee menu

    FlowingData reader Amir sent this along. In lieu of a list of coffee drinks, this place in in East London opted for ingredient breakdowns. I'm guessing there's a standard menu outside the frame, because otherwise, coffee neophytes (like me) would have no clue what to do. Anyone care to fill in the blanks?

    Spot any charts in the wild? You should email me a picture.

  • Internet Census

    March 22, 2013  |  Statistical Visualization

    Internet map

    Upon discovering hundreds of thousands open embedded devices on the Internet, an anonymous researcher conducted a Census of the Internet, mapping 460 million IP addresses around the world.

    While playing around with the Nmap Scripting Engine (NSE) we discovered an amazing number of open embedded devices on the Internet. Many of them are based on Linux and allow login to standard BusyBox with empty or default credentials. We used these devices to build a distributed port scanner to scan all IPv4 addresses. These scans include service probes for the most common ports, ICMP ping, reverse DNS and SYN scans. We analyzed some of the data to get an estimation of the IP address usage.

    It's a pretty thorough analysis, but the conclusion interested me most:

    The why is also simple: I did not want to ask myself for the rest of my life how much fun it could have been or if the infrastructure I imagined in my head would have worked as expected. I saw the chance to really work on an Internet scale, command hundred thousands of devices with a click of my mouse, portscan and map the whole Internet in a way nobody had done before, basically have fun with computers and the Internet in a way very few people ever will. I decided it would be worth my time.

    It makes me feel...uneasy. [Thanks, Roger]

  • Bettings lines for becoming the next pope

    March 5, 2013  |  Statistical Visualization

    Probability of next pope

    Who's going to be the next pope? I know all of you are sitting on the edge of your seats. Luckily, an analytical research manager who goes by the name AJ hacked together a pope tracker.

    Despite not being Catholic, the papal election fascinates me. Not sure if it’s the old rituals, the world-wide interest, or simply the fact that the Catholic Church has left a huge mark on history.

    There’s no way I know enough about the inner workings of the Catholic Church to have any idea on who the next Pope may be.

    Since domain knowledge is out, the next best option?

    Follow the money!

    He's scraping odds of possible candidates becoming pope from a betting site, and the above shows the numbers over time. The odds were bumpy at first, but there seems to be some convergence, and as of this writing, Cardinal Peter Turkson from Ghana is the heavy favorite. [via Revolutions]

  • State of the Union address decreasing reading level

    February 12, 2013  |  Statistical Visualization

    State of the Union address reading level

    With the State of the Union address tonight, The Guardian plotted the Flesh-Kincaid grade levels for past addresses. Each circle represents a state of the union and is sized by the number of words used. Color is used to provide separation between presidents. For example, Obama's state of the union last year was around the eighth-grade level, and in contrast, James Madison's 1815 address had a reading level of 25.3.

    My guess is this has to do with changes in how we write and talk more than anything else. Lee Drutman and Dan Drinkard for the Sunlight Foundation ran a more rigorous analysis on Congressional records back in May, and the declining trend is similar.

  • Super Bowl ad costs vs. company profit during game

    February 1, 2013  |  Statistical Visualization

    ad-spending-and-profits-smallerRitchie King for Quartz compared money spent on Super Bowl ads — now about $3.75 million for a 30-second spot — to how much the companies make on average in 3 and a half hours (the average length of a game).

    It's impossible to say exactly how much a successful Super Bowl ad ultimately earns a company. Surely the Wassup commercials were a huge boon for the Budweiser brand—but how huge?

    One thing is clear though: for the biggest advertisers, that $3.75 million is truly a pittance. In fact, some of them make almost as much in profits in an average 3.5 hours—roughly the time it takes to air the Super Bowl itself.

    Note that spending (on the bottom) is total between 2002 and 2011, and the vertical scales are different (so it probably would've been good to give more visual separation between the two charts), but still, kind of an interesting perspective.

  • Baseball Hall of Fame voting trajectories

    January 30, 2013  |  Statistical Visualization

    Hall of fame voting trajectories

    Carlos Scheidegger and Kenny Shirley, along with Chris Volinsky, visualized Major League Baseball Hall of Fame voting, from the first class in 1936 (which included Babe Ruth) up to present.

    All a fan can do is accept that Baseball Hall of Fame voting, conducted by the Baseball Writers Association of America (BBWAA), is a phenomenon unto itself. If we can't understand baseball Hall of Fame voting, though, maybe the next best thing is visualizing the data behind it. The set of interactive plots on this webpage is our attempt to do that. We were especially interested in two things: (1) viewing the trajectories of BBWAA vote percentage by year for different players throughout history, and (2) simultaneously viewing the career statistics of these players, to help find patterns and explain their trajectories (or to reassure ourselves that the writers really are crazy).

    The interactive is on the analysis side of the spectrum, so you might be a bit lost if you don't know a lick about baseball. However, if your're a baseball fan, there's a lot to play around with and dimensions to poke around at, as you can filter on pretty much all player stats such as home run count, batting average, and innings played. At the very least, you're getting a peek at how statisticians pick and prod at their data.

    Start at the examples section for quick direction. I eventually found myself looking for downward trajectories. Poor Mark McGwire. [Thanks, Chris]

  • Character mentions in Les Miserables

    January 14, 2013  |  Statistical Visualization

    Les mis character mentions

    Jeff Clark took a detailed look at Victor Hugo's Les Miserables via character mentions, word connections, and word usage. The above is character mentions with color showing sentiment. Red means negative, and blue positive.

    Characters are listed from top to bottom in their order of appearance. The horizontal space is segmented into the 5 volumes of the novel. Each volume is subdivided further with a faint line indicating the various books and, finally, small rectangles indicate the chapters within the books. In the 5 volumes there are a total of 48 books and 365 chapters. The height of the small rectangles indicate how frequently that character is mentioned in that particular chapter.

    There's a good amount of blue towards the end, when everyone decides everyone else isn't so bad.

    See the full version and other views here.

  • Five years of traffic fatalities

    January 8, 2013  |  Statistical Visualization

    Traffic fatalities - alcohol a factor

    I made a graphic a while back that showed traffic fatalities over a year. John Nelson extended on that, pulling five years of data and subsetting by some factors: alcohol, weather, and if a pedestrian was involved. And he aggregated by time of day and day of week instead of calendar dates.
    Continue Reading

  • Longer life expectancy, more years of disease

    December 19, 2012  |  Statistical Visualization

    Life expectancy and healthy years

    Bonnie Berkowitz, Emily Chow and Todd Lindeman for the Washington Post plotted life expectancy against percentage of healthy years. Although life expectancy is increasing, the percentage of years living without disease isn't quite keeping up.

    People are living longer lives, but the time they are gaining isn't entirely time with good health. For every year of life expectancy added since 1990, about 9 1/2 months is time in good health. The rest is time in a diminished state — in pain, immobility, mental incapacity or medical support such as dialysis. For people who survive to age 50, the added time is "discounted" even further. For every added year they get, only seven months are healthy.

    On the other hand, total number of expected years in good health is still on the plus-side, and I think most people would choose years in poor health over fewer years. So it's not all bad news.

  • Get a visual recap of your year on Twitter

    December 11, 2012  |  Statistical Visualization

    Year on Twitter

    As 2013 nears, let the recaps, reviews, and best ofs begin. Twitter put up their 2012 year in review of top tweets, trends, and such, which is mostly pictures and lists, but in collaboration with Vizify, they also have a section to visualize your own tweets. Click on the "View year on Twitter" button in the top right. Here's mine, for example. (Surprise, I mention maps, data, and charts often.)

    It's a word frequency chart that shows usage over the year. Scroll left to right or mouse over bubbles to see specific tweets. Mostly, it's just fun to look back. [Thanks, Todd]

  • How tax rates have changed

    November 30, 2012  |  Statistical Visualization

    Changing tax burden

    Mike Bostock, Matthew Ericson and Robert Gebeloff for the New York Times explored changing tax rates from 1980 to 2010, for various income levels.

    Most Americans paid less in taxes in 2010 than people with the same inflation-adjusted incomes paid in 1980, because of cuts in federal income taxes. At lower income levels, however, much of the savings was offset by increases in federal payroll taxes, state sales taxes and local property taxes. About half of households making less than $25,000 saved nothing at all.

    Instead of trying to squeeze everything into one space, the graphic reads like a story, with changes in different types of taxes and comparisons across income levels.

  • Mitt Romney losing likes on Facebook, in real-time

    November 12, 2012  |  Statistical Visualization

    Mitt Romney unlikes on Facebook

    If you go to the Facebook page for Mitt Romney, note the number of likes, wait a few seconds, and then refresh the page. The number of likes is decreasing fast enough that you can see the change over a short period of time. Disappearing Romney charts the change in real-time.

    Tick, tick, tick.

    See also Who Likes Mitt, with the quick API hack on github. [via @moebio]

  • History of film, 100 years in a chart

    November 2, 2012  |  Statistical Visualization

    History of Film

    In something of an homage to the Genealogy of Pop & Rock Music by Reebee Garofalo, designer Larry Gormley visualized 100 years of film.

    This graphic chronicles the history of feature films from the origins in the 1910s until the present day. More than 2000 of the most important feature-length films are mapped into 20 genres spanning 100 years. Films selected to be included have: won important awards such as the best picture Academy Award; achieved critical acclaim according to recognized film critics; are considered to be key genre films by experts; and/or attained box office success.

    Available in print for 34 bones.

  • Lord of the Rings visualized

    October 24, 2012  |  Statistical Visualization

    Decline of the longevity of men

    Driven by his love for Lord of the Rings, Emil Johansson explores the many facets of the world in charts and graphs. For example, the above chart is the declining lifespan of man.

    It is explicitly stated by Tolkien that the longevity of Men once granted to the Númenóreans decreased over the years. In Letter 156 Tolkien writes that "a good Númenórean died of free will when he felt it be the time to do so". With the Shadow and the Downfall of Númenor this grace was taken away from them and they died involuntarily with a decreasing lifespan.

    The decreasing life span is seen clearly in the graph. The most dramatic change is shortly before the Downfall of Númenor. The rulers are shown in order. Their number should not be confused with how many generations from Elros Aragorn is since there were more than one line of rulers.

    There's also a geographic map of where characters traveled, a family tree, a timeline, and even an Android app. I think Johansson might be a superfan. A hunch.

  • Presidential campaign finance explorer

    September 26, 2012  |  Statistical Visualization

    Presidential campaign finance explorer

    Hey, I think it's election season, and you know what that means. It's time to dig into campaign finance data from the Federal Election Commission. The Washington Post gives you a view into the amount of money raised and spent in both camps, where it's coming from and where it's going. They start with the high-level aggregates, and as you scroll down, you get the time series, followed by the breakdowns for money raised.

    The spending categories at the bottom are the most interesting bit. They cover advertising and mail, down to consulting and events. Payroll was a lot higher than I would've thought.

  • Color names plotted against gender

    September 20, 2012  |  Statistical Visualization

    His and Hers Colors by Stephen Von Worley

    A couple of years ago, xkcd ran a survey that asked people to name colors. Stephen Von Worley plotted that data by gender in an interactive.

    That's a dot for each of the 2,000 most commonly-used color names as harvested from the 5,000,000-plus-sample results of XKCD's color survey, sized by relative usage and positioned side-to-side by average hue and vertically by gender preference. Women tend to use color names nearer the top, men towards the bottom, and the dashed line represents the 50-50 split (equal usage by both sexes).

    While his original version was static, the interactive version lets you sort by hue, saturation, brightness, popularity, and name length. Most importantly, you can see the color names now when you mouse over. I like the vertical spectrum of purple, where women use names like bright lilac, orchid, and heather, and men tend to label similar shades as purplish, lightish purple, and oh yes, very light purple. [Thanks, Stephen]

Unless otherwise noted, graphics and words by me are licensed under Creative Commons BY-NC. Contact original authors for everything else.