• June 10, 2013

    Topic

    News  / 

    With all the stuff going on with surveillance and data privacy — especially the past week — it’s worthwhile to revisit this essay by Daniel J. Solove, a professor of law at George Washington University, on why privacy matters even if you “have nothing to hide.”

    “My life’s an open book,” people might say. “I’ve got nothing to hide.” But now the government has large dossiers of everyone’s activities, interests, reading habits, finances, and health. What if the government leaks the information to the public? What if the government mistakenly determines that based on your pattern of activities, you’re likely to engage in a criminal act? What if it denies you the right to fly? What if the government thinks your financial transactions look odd—even if you’ve done nothing wrong—and freezes your accounts? What if the government doesn’t protect your information with adequate security, and an identity thief obtains it and uses it to defraud you? Even if you have nothing to hide, the government can cause you a lot of harm.

    “But the government doesn’t want to hurt me,” some might argue. In many cases, that’s true, but the government can also harm people inadvertently, due to errors or carelessness.

    You might not have anything to hide right now, but maybe a random string of choices that was completely harmless looks a lot like something else a few years from now, to someone sniffing around the archives. The patterns when there are no patterns sort of thing. Personal data without the person. [via @hmason]

  • The Brewers Association just released data for 2012 on craft beer production and growth. The New Yorker mapped the data in a straightforward interactive.

    As of March, the United States was home to nearly two thousand four hundred craft breweries, the small producers best known for India pale ales and other decidedly non-Budweiser-esque beers. What’s more, they are rapidly colonizing what one might call the craft-beer frontier: the South, the Southwest, and, really, almost any part of the country that isn’t the West or the Northeast.

    Most articles and lists on craft beer tend to focus on total production and breweries, so California, a big state with a lot of people, always ends up on top. And as a Californian, I’m more than happy with my access to all the fine brews around here, but clearly, there are many more states to visit. RV trip anyone? [via @kennethfield]

  • Because every day is a good day to listen to Hans Rosling talk numbers. In this short video, Rosling uses Lego bricks to explain population growth and the gaps in wealth and carbon footprint.

  • When you talk to different people across the United States, you notice small differences in how people pronounce words and phrases. Sometimes different terms are used to describe the same thing. Bert Vaux’s dialect survey tried to capture these differences, and NC State statistics graduate student Joshua Katz mapped the data.
    Read More

  • Josh Orter takes back-of-the-napkin math to the next level with Stupid Calculations, which promises to turn practical facts into utterly useless ones. Stupid calculation number one is the size of a giant iPhone screen if you combined all the iPhone screens ever sold into one.

    The eye-glazing calculations are laid out below for those who appreciate the dirty work but, skipping ahead, the Kubrick-inspired monophone would stretch 5,059 feet into the sky and have a base measuring 2,846 feet across (Central Park is 2,640 feet wide). Its surface area would take in 2.07 billion square inches. That’s 14.39 million square feet or 330.54 acres. The new World Trade Center, by comparison, will have a surface area of 23 glass-clad acres, giving us enough screenage to watch Game of Thrones on all four sides of fourteen WTCs.

    See also how long it would it take to drink the water in an olympic-sized pool through a straw.

  • Using data from the London Fire Brigade, James Cheshire mapped 144,000 incidents in London.

    This map shows the geography of fire engine callouts across London between January and September 2011. Each of the 144,000 or so lines represents a fire engine (pump) attending an incident (rounded to the nearest 100m) and they have been coloured according to the broad type of incident attended. These incident types have been further broken down in the bar chart on the bottom right. False alarms (in blue), for example, can be malicious (fortunately these are fairly rare), genuine or triggered by an automatic fire alarm (AFA). As the map shows, false alarms – thanks I guess to AFAs in office buildings – seem most common in central London.

    It looks a lot like a sky of fireworks in this view. I bet a map for each category might help flesh out different patterns.

  • Microsoft researcher Kate Crawford describes several myths of big data. Myth #4: It makes cities smarter.

    “It’s only as good as the people using it,” Ms. Crawford said. Many of the sensors that track people as they manage their urban lives come from high-end smartphones, or cars with the latest GPS systems. “Devices are becoming the proxies for public needs,” she said, “but there won’t be a moment where everyone has access to the same technology.” In addition, moving cities toward digital initiatives like predictive policing, or creating systems where people are seen, whether they like it or not, can promote lots of tension between individuals and their governments.

    Yep. I hear those people things can introduce a lot of challenges.

  • Data Points: Visualization that Means SomethingIt seems like ages since we ran one of these.

    It’s hard to believe Data Points hit the shelves two months ago. (Thank you to everyone who got a copy!) It still feels brand new in my head. I kind of thought that time would slow down after I finished the book (and dissertation), but it seems to be moving even faster now.

    Anyways, if you’d like a chance to win a copy of Data Points blemished by my signature, leave a comment below by Wednesday, June 5, 2013 11:59pm PST. Tell us what your favorite number is and why. One entry per person please. I’ll pick a winner at random via sample() in R. Good luck.

    And of course, if you can’t wait, have never been lucky at cards, or want a blemish-free version of the book, you can get it at online and physical bookstores everywhere.

  • How to Make Slopegraphs in R

    Also known as specialized or custom line charts. Figure out how to draw lines with the right spacing and pointed in the right direction, and you’ve got your slopegraphs.

  • The central limit theorem:

    In probability theory, the central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed.

    Victor Powell animated said random variables falling into a normal distribution (which should look familiar to those who have seen that ping pong ball exhibit in exploratoriums and science museums). Play around with the number of bins and delay time and watch it go.

  • June 2, 2013

    Topic

    Maps  / 

    Twitter mapped all the geotagged tweets since 2009. There’s billions of them, so as you might expect, roads, city centers, and pathways emerge. And it only took 20 lines of R code to make the maps.

  • In celebration of Arrested Development’s return via Netflix, NPR combed through the jokes — obvious and obscure — and set them in a handy interactive guide.

    Arrested Development is back! Because we’re obsessed we care about your watching enjoyment, we wrote down all the recurring gags in every episode — including the new season 4 episodes — with special attention to jokes hidden in the background (like Cloudmir vodka) or being foreshadowed (like when Buster lost his hand).

    The three categories of joke are color-coded, where each row represents a joke and a tick represents an occurrence of that joke over four seasons.

    I’ve only watched a handful of episodes, but I’m tempted to turn on Netflix with this guide in front of me. [Thanks, @onyxfish]

  • In distributed denial-of-service attack a bunch of machines make a bunch of requests to a server to make it buckle under the pressure. There was recently an attack on VideoLAN’s download infrastructure. Here’s what it looked like.


    Read More

  • NYT hospital browserThe Centers for Medicare and Medicaid Services released billing data for more than 3,000 U.S. hospitals, showing high variance in cost of health scare across the country and even between nearby hospitals.

    As part of the Obama administration’s work to make our health care system more affordable and accountable, data are being released that show significant variation across the country and within communities in what hospitals charge for common inpatient services.

    The data provided here include hospital-specific charges for the more than 3,000 U.S. hospitals that receive Medicare Inpatient Prospective Payment System (IPPS) payments for the top 100 most frequently billed discharges, paid under Medicare based on a rate per discharge using the Medicare Severity Diagnosis Related Group (MS-DRG) for Fiscal Year (FY) 2011. These DRGs represent almost 7 million discharges or 60 percent of total Medicare IPPS discharges.

    The data is downloadable as CSV or Excel files and is surprisingly usable and worth a look.

    The New York Times has a useful per-hospital browser and The Washington Post provides quick comparisons by state.