• Find new beers to drink

    March 5, 2014  |  Network Visualization

    Beer similarities

    Based on reviews from BeerAdvocate, Beer Viz, a visualization class project, asks you to choose a general style of beer and a beer that you like. Then it shows you beers that are similar, based on appearance, taste, aroma, and overall score. It's like a visual version of the beer recommendation system we saw last year.

  • Basketball movements visualized

    March 4, 2014  |  Mapping

    Tim Duncan movements

    The NBA has been kind of gaga over data the past few years, and they recently announced that all 30 teams would have player tracking installed so they can see where they go at night after games. Wait, no. I mean so that there is data on where each player is on the court at any given time. Fathom Information Design played with some of this data for an Oklahoma City versus San Antonio game, with some sketches.

    Above are the movements of power forward Tim Duncan, who sticks around the middle of the court throughout a game. A guard on the other hand, runs around the court more. This is obvious if you've watched him play, but sketches like this coupled with spatiotemporal analysis could be interesting.

    Also, I get the sense that there's more people who want to know about this data than there are who know how to, so if you're a statistician on the job hunt, there's that.

  • ProPublica opened a data store

    March 4, 2014  |  Data Sources

    One of the main challenges of any data project is getting the data. It seems obvious, but the effort to get the right data to answer a question seems to catch people off guard. Even data that's "free" to download can be a huge pain that ends up completely useless. ProPublica, the non-profit newsroom, deals with this stuff on a regular basis and hopes that some of their efforts can turn into a source of funding through the Data Store.

    Like most newsrooms, we make extensive use of government data — some downloaded from "open data" sites and some obtained through Freedom of Information Act requests. But much of our data comes from our developers spending months scraping and assembling material from web sites and out of Acrobat documents. Some data requires months of labor to clean or requires combining datasets from different sources in a way that's never been done before.

    In the Data Store you'll find a growing collection of the data we've used in our reporting. For raw, as-is datasets we receive from government sources, you'll find a free download link that simply requires you agree to a simplified version of our Terms of Use. For datasets that are available as downloads from government websites, we've simply linked to the sites to ensure you can quickly get the most up-to-date data.

    For datasets that are the result of significant expenditures of our time and effort, we're charging a reasonable one-time fee: In most cases, it's $200 for journalists and $2,000 for academic researchers.

    I hope it works.

  • Solar time versus standard time around the world

    March 3, 2014  |  Mapping

    How much is time wrong around the world?

    After noting the later dinner time in Spain, Stefano Maggiolo noted relatively late sunsets for one of the possible reasons, compared to standard time. Then he mapped sunset time versus standard time around the world.

    Looking for other regions of the world having the same peculiarity of Spain, I edited a world map from Wikipedia to show the difference between solar and standard time. It turns out, there are many places where the sun rises and sets late in the day, like in Spain, but not a lot where it is very early (highlighted in red and green in the map, respectively). Most of Russia is heavily red, but mostly in zones with very scarce population; the exception is St. Petersburg, with a discrepancy of two hours, but the effect on time is mitigated by the high latitude. The most extreme example of Spain-like time is western China: the difference reaches three hours against solar time. For example, today the sun rises there at 10:15 and sets at 19:45, and solar noon is at 15:01.

  • Why you should buy the bigger pizza

    February 28, 2014  |  Statistical Visualization

    Pizza price

    Because you get more pizza to eat, and if you don't finish it, you'll have breakfast tomorrow. Other than that fine reason, well, it's geometrically the better deal. Planet Money explains with an interactive that shows the price per square inch for 3,678 pizza places across the United States, based on data from Grubhub.

    The math of why bigger pizzas are such a good deal is simple: A pizza is a circle, and the area of a circle increases with the square of the radius.
    More pizza more problems

    So, for example, a 16-inch pizza is actually four times as big as an 8-inch pizza.

    And when you look at thousands of pizza prices from around the U.S., you see that you almost always get a much, much better deal when you buy a bigger pizza.

    You get more pizza, and the business gets more money with minimal extra pizza-making effort. Win-win. Although, keep going on the horizontal axis and I bet that curve starts to curl up. Where can I get a ten-foot pizza?

  • How to Read and Use Histograms

    February 27, 2014  |  Tutorials

    How to Read Histograms and Use Them in R

    The histogram is one of my favorite chart types, and for analysis purposes, I probably use them the most. Devised by Karl Pearson (the father of mathematical statistics) in the late 1800s, it's simple geometrically, robust, and allows you to see the distribution of a dataset.

    If you don't understand what's driving the chart though, it can be confusing, which is probably why you don't see it often in general publications.
    Continue Reading

  • Job Board, February 2014

    February 26, 2014  |  Job Board

    Looking for a job in data science, visualization, or statistics? There are openings on the board.

    Senior Associate Director, Analytics for the University of Chicago in Chicago, Illinois.

    Data Scientist for Thumbtack in San Francisco, California.

    Communications Officer, Measurement and Analysis for the Bill and Melinda Gates Foundation in Seattle, Washington.

    Senior Graphics Editor for The Wall Street Journal in New York, New York.

    Basketball Analyst for the Philadelphia 76ers in Philadelphia, Pennsylvania.

  • Game theory to win game shows

    February 26, 2014  |  Statistics

    I like how a little bit of game theory has crept into Jeopardy! with contestant Arthur Chu. He bounces around the board in search of Daily Doubles and bets to tie in final Jepoardy. Chu doesn't know much about game theory himself but applies rules promoted by a past contestant.

    The ultimate champion, Ken Jennings, praises Chu on Slate.

    But in fact, plenty of nice white boys on Jeopardy! have been pilloried by viewers for using Arthur Chu's signature technique: bopping around the game board seemingly at whim, rather than choosing the clues from top to bottom, as most contestants do. This is Chu's great crime, the kind of anarchy that hard-core Jeopardy! fans will not countenance. The technique was pioneered in 1985 by a five-time champ named Chuck Forrest, whose law school roommate suggested it. The "Forrest bounce," as fans still call it, kept opponents off balance. He would know ahead of time where the next clue would pop up; they’d be a second slow.

    I don't watch Jeopardy! much, but it's pretty fun to watch Chu dominate.

    Then there's the most recent RadioLab. The first part talks about a game show called Golden Balls and the prisoner's dilemma, and how a guy — who plays and wins game shows for a living — won this one. The whole show is entertaining as usual, but this first part is of particular interest. After listening to that, watch the Golden Balls clip to see how it played out.

  • An exploration of selfies

    February 25, 2014  |  Data Art

    Selfie City

    Selfiecity, from Lev Manovich, Moritz Stefaner, and a small group of analysts and researchers, is a detailed visual exploration of 3,200 selfies from five major cities around the world. The project is both a broad look at demographics and trends, as well as a chance to look closer at the individual observations.
    Continue Reading

  • Near-real-time global forest watch

    February 24, 2014  |  Mapping

    Global forest watch

    Global Forest Watch uses satellite imagery and other technologies to estimate forest usage, change, and tree cover (among other things). These estimates and their eventual actions used to be slow. Now they're near-real-time.

    This is about to change with the launch of Global Forest Watch—an online forest monitoring system created by the World Resources Institute, Google and a group of more than 40 partners. Global Forest Watch uses technologies including Google Earth Engine and Google Maps Engine to map the world’s forests with satellite imagery, detect changes in forest cover in near-real-time, and make this information freely available to anyone with Internet access.

    Many layers and high granularity. Take your time with this one.

  • A human-readable explorer for SEC filings

    February 21, 2014  |  Statistical Visualization

    SEC filings

    Maris Jensen just made SEC filings readable by humans. The motivation:

    But in the twenty years since, despite hundreds of millions invested in rounds of contracted EDGAR modernization efforts and interactive data false starts, the SEC's EDGAR has remained almost untouched. In 2014, the SEC is quite literally doing less with SEC filings than their predecessors had planned for 1984. Data tagging is the red-headed stepchild of the Commission -- out of hundreds of forms, only about a dozen are filed as structured data -- and the first program to automate the selection of SEC filings for review, the Division of Economic and Risk Analysis (DERA)'s 'Robocop', has been 'aspirational' for years. The academics in the division responsible for the SEC's interactive data initiatives write papers about information asymmetry, using EDGAR data they repurchase in usable form for millions each year, but do nothing to fix it. Companies are chastised for insufficient and inefficient disclosure, while the SEC fails to help retail investors navigate corporate disclosures at all.

    Look up a company and see their financials, ownership, influences, and board members, among other things typically not so straightforward to look up.

  • Using slime mold to find the best motorway routes

    February 20, 2014  |  Mapping

    This is all sorts of neat. Researchers Andrew Adamatzky and Ramon Alonso-Sanz are using a slime mold, P polycephalum, to find the most efficient road routes to provide guidance on how to rework them. P polycephalum is a single-celled organism that forages for food through various branches, and when it finds the most efficient food source, backs away from the others. The video above is a sped up version of it in action. Adamatzky and Alonso-Sanz put a map underneath.

    We cut agar plates in a shape of Iberian peninsula, place oat flakes at the sites of major urban areas and analyse the foraging network developed. We compare the plasmodial network with principle motorways and also analyse man-made and plasmodium networks in a framework of planar proximity graphs.

    [via infosthetics]

  • Why we think of north pointing up

    February 19, 2014  |  Mapping

    Claudius Ptolemy world map

    Nick Danforth for Al Jazeera delves into the history books for why north is typically on the top of our maps. There's no single reason for it, but Ptolemy might have had something to do with it.

    The north's position was ultimately secured by the beginning of the 16th century, thanks to Ptolemy, with another European discovery that, like the New World, others had known about for quite some time. Ptolemy was a Hellenic cartographer from Egypt whose work in the second century A.D. laid out a systematic approach to mapping the world, complete with intersecting lines of longitude and latitude on a half-eaten-doughnut-shaped projection that reflected the curvature of the earth. The cartographers who made the first big, beautiful maps of the entire world, Old and New — men like Gerardus Mercator, Henricus Martellus Germanus and Martin Waldseemuller — were obsessed with Ptolemy. They turned out copies of Ptolemy's Geography on the newly invented printing press, put his portrait in the corners of their maps and used his writings to fill in places they had never been, even as their own discoveries were revealing the limitations of his work.

    Ptolemy put north on top. Although, we don't know why he put it there.

  • A visual explanation of conditional probability

    February 18, 2014  |  Statistics

    Conditional probability

    Victor Powell, who has visualized the Central Limit Theorem and Simpson's Paradox, most recently provided a visual explainer for conditional probability.

    Two bars, one blue and one red, represent two events that can happen together or independently of the other. When a ball hits a bar the corresponding event occurs. What is the probability that one event occurs given that the other does and vice versa? If the probability of both events increases and decreases, how does that change the separate probabilities? Sliders and options let you experiment, and the visual and counters change to help you learn.

    A fun one to tinker with.

  • Surviving on minimum wage

    February 17, 2014  |  Infographics

    Surviving on minimum wage

    As most of us know, it's not easy getting by on minimum wage, and in some places it's not possible. The New York Times provides a calculator to see how challenging it can be.

    A simple visual on the right shows dollars made per year, one box per dollar colored green initially and then red to signal debt. It's a good way to make the numbers more relatable. Select a state, enter expenses, and watch dollars disappear, and most likely you'll end up in the red early.

  • Data grab bag

    February 14, 2014  |  Miscellaneous

    Spring

    Using Dates and Times in R.

    — Jerzy Wieczorek describes his first semester as a stat PhD student.

    — Apparently there's a stochastic process in probability theory called the Chinese Restaurant Process, and a closely related Indian Buffet Process. Matt Dickenson made a quick visualization to demonstrate the latter. These require more investigation for their names alone.

    — Stravinky's The Rite of Spring visualized.

    — Why didn't anyone tell me Arduino was so fun and accessible? It's like LEGOs raised to the nerdeth degree.

    Source code in TV and film.

  • Reporter app, for self-discovery through data

    February 13, 2014  |  Self-surveillance

    Reporter app

    Nicholas Felton, Drew Breunig, and Friends of the Web released Reporter for iPhone. The app—$3.99 on the app store—prompts you with quizzes, such as who you're with or what you're doing, sparsely throughout the day to help you collect data about yourself and surroundings. You can also create your own survey questions to collect data on what interests you and use your phone's existing capabilities to record location, sound levels, weather, and photo counts automatically.
    Continue Reading

  • Basketball analytics

    February 12, 2014  |  Statistics

    Kirk Goldsberry talks the rise of analytics usage in the NBA. With cameras above every court recording player movements, there's a higher granularity analysis that is now possible, beyond the box score. One of the key metrics is expected possession value, or EPV, which estimates the number of points a possession is worth, given where everyone is on the court and where the ball is.

    But the clearest application of EPV is quantifying a player's overall offensive value, taking into account every single action he has performed with the ball over the course of a game, a road trip, or even a season. We can use EPV to collapse thousands of actions into a single value and estimate a player's true value by asking how many points he adds compared with a hypothetical replacement player, artificially inserted into the exact same basketball situations. This value might be called "EPV-added" or "points added."

    As a basketball fan, I hope this makes the game more fun and interesting to watch, and as a statistician, I hope this work can be applied to other facets of life like traffic or local movements. If just the latter, that'd be fine too.

  • Interactive maps with R

    February 11, 2014  |  Software

    Interactive maps with R

    You can make static maps in R relatively well, if you know what packages to use and what to look for, but there isn't much direct interaction with your graphics. rMaps is a package that helps you create maps that you can mouse over and zoom in to.

    Don't get too excited though. A scan of the docs shows that it's basically a wrapper around JavaScript libraries Leaflet, DataMaps and Crosslet, so you could learn those directly instead, and you'd be better for it in the long run if you plan to make more maps. But if you're just working on a one-off or must stay in R because your life depends on, rMaps might be an option.

  • Olympic event explainer videos

    February 10, 2014  |  Infographics

    Olympics coverage by NYT

    Winter Olympic events are filled with subtleties that if you know about them, can help you appreciate athletes' skills and the sports a bit more. The New York Times published three explainer videos to help you do just that. So far, there's one on slopestyle, which has roots in the Winter X Games, another on the luge, which is freakin' dangerous, and the halfpipe, from Shaun White's perspective. The features are a nice combination of video, graphics, and narrative.

    If you're watching the Olympics, do yourself a favor and bookmark NYT Olympic coverage.

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.