• Looking for a job in data science, visualization, or statistics? There are openings on the board.

    Assistant/Associate Professor-Statistics-Mathematical Sciences for the University of Massachusetts Lowell in Lowell, Massachusetts.

    Data Visualization Engineer / Reporting Manager for Nike in Portland, Oregon.

    Lead data Visualization Design-Developer (freelance) for SingTel in Singapore.

  • After he noticed gambling odds fluctuate wildly at the end of a football game, Todd Schneider realized a correlation between betting odds and game excitement. The Gambletron 2000 is a fun look into the proxy.

    It occurred to me then that variance in gambling market odds is a good way to quantify how exciting a game is. Modern betting exchanges allow gamblers to bet throughout the course of a game. The odds, which can also be expressed as win probabilities, continually readjust as the game progresses. My claim is that the more the odds fluctuate during a game, the more exciting that game is.

    Games and odds update automatically up to the minute, with a highlight on the “hotness” of games, or the amount of variation over time. A blowout game shows a line that heads towards 100 percent probability that a team will win, whereas a comeback game shows a dip towards 100 percent for one team and then a trend back towards 100 percent for the opposition.

    I had the odds for the Golden State-Portland game open for part of the time tonight, and it was kind of a fun accompaniment.

    Mobile alert app for sports, anyone? Current offerings are abysmal.

  • The Star Tribune has a fun interactive that recommends Minnesota brews, based on five key beer characteristics. Use sliders to enter your preference of bitterness, aroma, etc and the results come in radar graph form.

    Whether you’re a creature of habit or always up for something new, this tool will help you get to know what’s brewing in Minnesota. We’ve catalogued more than 100 beers from 36 Minnesota breweries and sorted them by five characteristics.

    I fully expect someone to expand this to the rest of the world.

  • The Atlantic interviewed Dr. Demetrios Matsakis, Chief Scientist for Time Services at the US Naval Observatory about where time comes from, the precision required and how they obtain it, and why we need such precision. Five seconds into it, my wife commented, “That sounds nerdy.” That’s how you know it’s gonna be good.

  • Tony Haile discusses how we read and share online, based on actual data. It’s not as click- and pageview-based as you might think.

    A widespread assumption is that the more content is liked or shared, the more engaging it must be, the more willing people are to devote their attention to it. However, the data doesn’t back that up. We looked at 10,000 socially-shared articles and found that there is no relationship whatsoever between the amount a piece of content is shared and the amount of attention an average reader will give that content.

    When we combined attention and traffic to find the story that had the largest volume of total engaged time, we found that it had fewer than 100 likes and fewer than 50 tweets. Conversely, the story with the largest number of tweets got about 20% of the total engaged time that the most engaging story received.

  • There’s plenty of software to muck around with data, but to gain the skills to really get something out of it, that takes time and experience. Mikio Braun, a post doc in machine learning, explains.

    For a number of reasons, I don’t think that you cannot “toolify” data analysis that easily. I wished it would be, but from my hard-won experience with my own work and teaching people this stuff, I’d say it takes a lot of experience to be done properly and you need to know what you’re doing. Otherwise you will do stuff which breaks horribly once put into action on real data.

    And I don’t write this because I don’t like the projects which exists, but because I think it is important to understand that you can’t just give a few coders new tools and they will produce something which works. And depending on how you want to use data analysis in your company, this might break or make your company.

    Braun breaks it down into four bullet points worth a read, but the tl;dr version is that analysis isn’t simple, and no tool is going to do everything for you. It’s simple with simple data, but you can almost always go deeper with more data, and it takes experience to ask the right questions. So try not to be too content with that software output.

  • John McDuling for Quartz writes about the FiveThirtyEight replacement.

    David Leonhardt, the Times’ former Washington bureau chief, who is in charge of The Upshot, told Quartz that the new venture will have a dedicated staff of 15, including three full-time graphic journalists, and is on track for a launch this spring. “The idea behind the name is, we are trying to help readers get to the essence of issues and understand them in a contextual and conversational way,” Leonhardt says. “Obviously, we will be using data a lot to do that, not because data is some secret code, but because it’s a particularly effective way, when used in moderate doses, of explaining reality to people.”

    With the new FiveThirtyEight coming soon, The Upshot, and plenty of smaller bits sprouting up in other areas, this data-driven news thing might be more than a fad. Hey, statisticians, you want to get in on this? Seriously, there’s plenty of data to go around.

  • How much caffeine can you consume during the day and still fall asleep at night? For some, it’s one cup and they’re up all night, whereas others don’t feel a thing. UP Coffee, an app from Jawbone Labs, helps you understand your own consumption and caffeine tolerance.

    Data entry is straightforward since it’s only for caffeine-related beverages, such as coffee and soda. Enter your beverage, and the app tabulates caffeine amounts for you.

    The key though is that it doesn’t just stop at milligrams. What’s 100 milligrams of caffeine mean anyways? Instead, with a focus on sleep, it tells you how much caffeine you’ve consumed and how many hours you’re expected to feel the effects.

    Pair it with your Jawbone UP band and account for an even wider out picture. Although you don’t have to. I’ve been using the app with neither, and it’s still fun the play with. And it kind of makes me want a band.

  • Forget bell curves, jellybeans, and coin flips to explain statistical concepts. Dancing Statistics is a video series that demonstrates variance, correlation, and sampling through coreographed movements. The dance below explains variance.

    Watch the full playlist here. [via infosthetics]

  • Justin Blinder used New York’s city planning dataset and Google Streetview for a before and after view of vacant lots.

    Vacated mines and combines different datasets on vacant lots to present a sort of physical facade of gentrification, one that immediately prompts questions by virtue of its incompleteness: “Vacated by whom? Why? How long had they been there? And who’s replacing them?” Are all these changes instances of gentrification, or just some? While we usually think of gentrification in terms of what is new or has been displaced, Vacated highlights the momentary absence of such buildings, either because they’ve been demolished or have not yet been built. All images depicted in the project are both temporal and ephemeral, since they draw upon image caches that will eventually be replaced.

  • Based on reviews from BeerAdvocate, Beer Viz, a visualization class project, asks you to choose a general style of beer and a beer that you like. Then it shows you beers that are similar, based on appearance, taste, aroma, and overall score. It’s like a visual version of the beer recommendation system we saw last year.

  • The NBA has been kind of gaga over data the past few years, and they recently announced that all 30 teams would have player tracking installed so they can see where they go at night after games. Wait, no. I mean so that there is data on where each player is on the court at any given time. Fathom Information Design played with some of this data for an Oklahoma City versus San Antonio game, with some sketches.

    Above are the movements of power forward Tim Duncan, who sticks around the middle of the court throughout a game. A guard on the other hand, runs around the court more. This is obvious if you’ve watched him play, but sketches like this coupled with spatiotemporal analysis could be interesting.

    Also, I get the sense that there’s more people who want to know about this data than there are who know how to, so if you’re a statistician on the job hunt, there’s that.

  • One of the main challenges of any data project is getting the data. It seems obvious, but the effort to get the right data to answer a question seems to catch people off guard. Even data that’s “free” to download can be a huge pain that ends up completely useless. ProPublica, the non-profit newsroom, deals with this stuff on a regular basis and hopes that some of their efforts can turn into a source of funding through the Data Store.

    Like most newsrooms, we make extensive use of government data — some downloaded from “open data” sites and some obtained through Freedom of Information Act requests. But much of our data comes from our developers spending months scraping and assembling material from web sites and out of Acrobat documents. Some data requires months of labor to clean or requires combining datasets from different sources in a way that’s never been done before.

    In the Data Store you’ll find a growing collection of the data we’ve used in our reporting. For raw, as-is datasets we receive from government sources, you’ll find a free download link that simply requires you agree to a simplified version of our Terms of Use. For datasets that are available as downloads from government websites, we’ve simply linked to the sites to ensure you can quickly get the most up-to-date data.

    For datasets that are the result of significant expenditures of our time and effort, we’re charging a reasonable one-time fee: In most cases, it’s $200 for journalists and $2,000 for academic researchers.

    I hope it works.

  • After noting the later dinner time in Spain, Stefano Maggiolo noted relatively late sunsets for one of the possible reasons, compared to standard time. Then he mapped sunset time versus standard time around the world.

    Looking for other regions of the world having the same peculiarity of Spain, I edited a world map from Wikipedia to show the difference between solar and standard time. It turns out, there are many places where the sun rises and sets late in the day, like in Spain, but not a lot where it is very early (highlighted in red and green in the map, respectively). Most of Russia is heavily red, but mostly in zones with very scarce population; the exception is St. Petersburg, with a discrepancy of two hours, but the effect on time is mitigated by the high latitude. The most extreme example of Spain-like time is western China: the difference reaches three hours against solar time. For example, today the sun rises there at 10:15 and sets at 19:45, and solar noon is at 15:01.

  • Because you get more pizza to eat, and if you don’t finish it, you’ll have breakfast tomorrow. Other than that fine reason, well, it’s geometrically the better deal. Planet Money explains with an interactive that shows the price per square inch for 3,678 pizza places across the United States, based on data from Grubhub.

    The math of why bigger pizzas are such a good deal is simple: A pizza is a circle, and the area of a circle increases with the square of the radius.
    More pizza more problems

    So, for example, a 16-inch pizza is actually four times as big as an 8-inch pizza.

    And when you look at thousands of pizza prices from around the U.S., you see that you almost always get a much, much better deal when you buy a bigger pizza.

    You get more pizza, and the business gets more money with minimal extra pizza-making effort. Win-win. Although, keep going on the horizontal axis and I bet that curve starts to curl up. Where can I get a ten-foot pizza?

  • The chart type often goes overlooked because people don’t understand them. Maybe this will help.

  • Looking for a job in data science, visualization, or statistics? There are openings on the board.

    Senior Associate Director, Analytics for the University of Chicago in Chicago, Illinois.

    Data Scientist for Thumbtack in San Francisco, California.

    Communications Officer, Measurement and Analysis for the Bill and Melinda Gates Foundation in Seattle, Washington.

    Senior Graphics Editor for The Wall Street Journal in New York, New York.

    Basketball Analyst for the Philadelphia 76ers in Philadelphia, Pennsylvania.

  • I like how a little bit of game theory has crept into Jeopardy! with contestant Arthur Chu. He bounces around the board in search of Daily Doubles and bets to tie in final Jepoardy. Chu doesn’t know much about game theory himself but applies rules promoted by a past contestant.

    The ultimate champion, Ken Jennings, praises Chu on Slate.

    But in fact, plenty of nice white boys on Jeopardy! have been pilloried by viewers for using Arthur Chu’s signature technique: bopping around the game board seemingly at whim, rather than choosing the clues from top to bottom, as most contestants do. This is Chu’s great crime, the kind of anarchy that hard-core Jeopardy! fans will not countenance. The technique was pioneered in 1985 by a five-time champ named Chuck Forrest, whose law school roommate suggested it. The “Forrest bounce,” as fans still call it, kept opponents off balance. He would know ahead of time where the next clue would pop up; they’d be a second slow.

    I don’t watch Jeopardy! much, but it’s pretty fun to watch Chu dominate.

    Then there’s the most recent RadioLab. The first part talks about a game show called Golden Balls and the prisoner’s dilemma, and how a guy — who plays and wins game shows for a living — won this one. The whole show is entertaining as usual, but this first part is of particular interest. After listening to that, watch the Golden Balls clip to see how it played out.

  • Selfiecity, from Lev Manovich, Moritz Stefaner, and a small group of analysts and researchers, is a detailed visual exploration of 3,200 selfies from five major cities around the world. The project is both a broad look at demographics and trends, as well as a chance to look closer at the individual observations.
    Read More

  • Global Forest Watch uses satellite imagery and other technologies to estimate forest usage, change, and tree cover (among other things). These estimates and their eventual actions used to be slow. Now they’re near-real-time.

    This is about to change with the launch of Global Forest Watch—an online forest monitoring system created by the World Resources Institute, Google and a group of more than 40 partners. Global Forest Watch uses technologies including Google Earth Engine and Google Maps Engine to map the world’s forests with satellite imagery, detect changes in forest cover in near-real-time, and make this information freely available to anyone with Internet access.

    Many layers and high granularity. Take your time with this one.