• Fast and slow visualization

    Visualization is often described in the context of speed and efficiency. Get the most insight for the least amount of ink or pixels. Elijah Meeks argues that visualization goes far beyond this point of view:

    This breakneck pace is a real data visualization constraint. It’s not a myth that charts are often deployed in rooms full of people who only have a short time to comprehend them (or not) and make a decision. Automatic views into datasources are a critical aspect of exploratory data analysis and health checks. The fast mode of data visualization is real and important, but when we let it become our only view into what data visualization is, we limit ourselves in planning for how to build, support and design data visualization. We limit not only data visualization creators but also data visualization readers.

    In the three-parter, Meeks tries to make the fuzzy aspects of visualization — meaning, insight, impact, etc. — more concrete.

    See also:

    Note the dates on all of them. We’ll figure out this visualization thing one of these days.

  • When surveillance turns into stalking

    Many surveillance apps cater to parents who want to keep tabs on their children who have mobile phones. Many of these apps are used for less parental purposes. Jennifer Valentino-DeVries for The New York Times reports:

    More than 200 apps and services offer would-be stalkers a variety of capabilities, from basic location tracking to harvesting texts and even secretly recording video, according to a new academic study. More than two dozen services were promoted as surveillance tools for spying on romantic partners, according to the researchers and reporting by The New York Times. Most of the spying services required access to victims’ phones or knowledge of their passwords — both common in domestic relationships.

  • Amazon Rekognition for government surveillance

    Amazon’s Rekognition is a video analysis system that promises to identify individuals in real-time. Amazon wants to sell the systems to governments for surveillance.

    From the ACLU:

    Amazon is marketing Rekognition for government surveillance. According to its marketing materials, it views deployment by law enforcement agencies as a “common use case” for this technology. Among other features, the company’s materials describe “person tracking” as an “easy and accurate” way to investigate and monitor people. Amazon says Rekognition can be used to identify “people of interest,” raising the possibility that those labeled suspicious by governments — such as undocumented immigrants or Black activists — will be seen as fair game for Rekognition surveillance. It also says Rekognition can monitor “all faces in group photos, crowded events, and public places such as airports,” at a time when Americans are joining public protests at unprecedented levels.

    Given the millions of Alexa-enabled devices in people’s homes and customer purchase histories available on-demand, this feels like a bad idea. Also, creepy. Probably because of the ‘k’ in Rekognition.

  • Wow your friends during the game with random win percentages, based on various player stats.
    Keep Reading
  • Data scientists as the new Mad Men

    Ken Auletta for The New Yorker looks at “math men” replacing the Mad Men:

    Engineers and data scientists vacuum data. They see data as virtuous, yielding clues to the mysteries of human behavior, suggesting efficiencies (including eliminating costly middlemen, like agency Mad Men), offering answers that they believe will better serve consumers, because the marketing message is individualized. The more cool things offered, the more clicks, the more page views, the more user engagement. Data yield facts and advance a quest to be more scientific—free of guesses. As Eric Schmidt, then the executive chairman of Google’s parent company, Alphabet, said at the company’s 2017 shareholder meeting, “We start from the principles of science at Google and Alphabet.”

    I know the big tech companies is where the money is at, but I hope you young statisticians out there consider the other possibilities. Your skills are valued in many places.

  • Nigel Holmes new illustrated book on Crazy Competitions

    Nigel Holmes, the graphic designer known for his playful illustrated graphics, has a new book: Crazy Competitions. It’s exactly what it sounds like.

    Whether it’s flinging frozen rats or parading in holly evergreens, racing snails or carrying wives, human beings have long displayed their creativity in wild, odd, and sometimes just wonderful rituals and competitions. To show what lengths we’ll go to uphold our eccentric customs, British American graphic designer Nigel Holmes channels his belief in the power of hilarity to bring together a bewilderingly funny tour around the globe in search of incredible events, all dryly explained with brilliant infographics in Crazy Competitions.

    Ordered.

  • What’s in a food truck

    Food trucks are the real deal these days. The best ones serve a specialized menu really well, in a small, focused space. The Washington Post delves into the insides of several of these trucks and how they make the food with very specific equipment.

  • A visualization game to understand education and school segregation

    Educate Your Child by Gabrielle LaMarr LeMee uses census data and the school selection process to simulate the steps you might take in choosing your kid’s first school in Chicago.

    The Chicago public school system has a high level of school segregation as a result of parent’ residential and school choices as well as policy decisions that do not encourage integrated neighborhoods and schools.

    In this game, you are a parent of a 5-year-old child and now you have to make some decisions. Explore how your choices can have an impact on your child’s education and on the overall education of the city’s children.

    There should be more games like this based on census data. It seems to be a good way for an individual to latch on to data points while still getting a view of the grand scheme of things.

    See also more on LeMee’s design for details on modeling school choice.

  • Imagine that those with immigrants in their family tree left the country. Almost everyone, basically.
    Keep Reading
  • Subway delays visually explained

    Adam Pearce for The New York Times describes the sad state of affairs that is the delayed subway trains in New York. One delay causes a ripple effect down the line, leaving little chance to get back on track. The more straightforward figures gear you up for the overall view at the end.

    This was for New York specifically but is applicable to other transits and forms of transportation. See also the traffic gridlock simulation from a few years ago. It doesn’t take much for gridlock.

  • Data is, sometimes

    Financial Times recently updated their style guide:

    data — the rule for always using data as plural has been relaxed. If you read data as singular then write it as such. For example, we already allow singular for ‘big data’. And we should for personal data too. An easy rule would be that if it can be used as a synonym for information then it should probably be singular — and if we are using it as economic data and mean figures, then we should stick to plural.

    And for kicks, I dug up my New York Times style guide from 1999:

    data is acceptable as a singular term for information: The data was persuasive. In its traditional sense, meaning a collection of facts and figures, the noun can still be plural: They tabulate the data, which arrive from bookstores nationwide. (In this sense, the singular is datum, a word both stilted and deservedly obscure.)

    Data are sounds weird to me most of the time. When I say it like that, I feel like I should also drink a cup of tea with my pinky sticking out and a monocle firmly planted for distinction.

  • Every document copy stored on used digital photocopiers

    CBS News picked up four used photocopiers and looked at the hard drives. There was a lot of private information stored in them:

    Nearly every digital copier built since 2002 contains a hard drive – like the one on your personal computer – storing an image of every document copied, scanned, or emailed by the machine.

    In the process, it’s turned an office staple into a digital time-bomb packed with highly-personal or sensitive data.

    If you’re in the identity theft business it seems this would be a pot of gold.

    “The type of information we see on these machines with the social security numbers, birth certificates, bank records, income tax forms,” John Juntunen said, “that information would be very valuable.”

    Okay, save the dramatics, it’s still disconcerting.

    Not every photocopier makes it so easy to access copied documents, but it’s surprising that it’s still so straightforward with some machines these days. Then again, part of the responsibility belongs to the previous owners. As the Federal Trade Commission instructs, it’s like getting rid of a computer.

  • Making useless things

    Simone Giertz, bringer of joy and self-described expert in shitty robots, makes machines that succeed in failing. In her TED talk, Giertz talks about her path from “useless” things to expert. It’s all the more relevant after she found out she has a brain tumor.

    Keep Reading

  • What data visualization is for

    Eric Rodenbeck from Stamen Design discusses visualization the medium over visualization the tool or the insight-providing image:

    Dataviz! Data visualization! I don’t think it’s for anything! I don’t believe it’s meaningful to say that dataviz is for one thing, any more than it’s meaningful to say that architecture is for any one thing. Or that photography is for one thing, that it has a purpose that can be defined in a sentence or two. Or that movies are for one thing, one that you could win an argument about.

    Yes.

    See also Martin Wattenberg and Fernanda Viegas’ talk from a while back on the parallels between books and visualization.

  • Challenges ahead for the Census count

    The 2020 Census is coming up quick, but there’s still a lot up in the air. There’s no director, the bureau has to adjust to budget cuts, and a new digital system that promises to save money hasn’t been fully tested (because of lower funding). Exciting. Alvin Chang for Vox explains in more detail — with cartoons.

  • Comparison of terms and conditions lengths

    Most of us don’t read the terms and conditions before we click on “I agree” for the web services we use. They’re too long, and we need likes right away. For a student project, Dima Yarovinsky printed the terms and conditions on paper for major social apps — WhatsApp, Google, Tinder, Twitter, Facebook, Snapchat, and Instagram, respectively — which highlights what we’re getting into. [via @hailmika]

  • Members Only
    Moving your data from the digital screen to something more physical isn't as tricky as it seems. Here's how I did it.
    Keep Reading
  • Tweeting a map of every Census tract in the United States

    By Neil Freeman, the @everytract bot on Twitter, as the name suggests, is tweeting a map of every Census tract in numerical order. It’s one map each half hour.

    Census data, or data in general really, is typically in aggregate or about the overall trends, which requires an abstract view of a bunch of data points pushed together. So it’s nice to see a straightforward project put focus on the individual.

    Of this genre, the censusAmericans bot is my favorite. It tweets people’s biographies based on data from the American Community Survey.

  • Using statistical models to win almost $1B in horse-race gambling

    Kit Chellel for Bloomberg tells the riveting gambling story of Bill Benter, who used statistics to model horse-racing in Japan. My favorite part is the pre-Internet process Benter took to collect data and predict results:

    Benter’s model required his undivided attention. It monitored only about 20 inputs—just a fraction of the infinite factors that influence a horse’s performance, from wind speed to what it ate for breakfast. In pursuit of mathematical perfection, he became convinced that horses raced differently according to temperature, and when he learned that British meteorologists kept an archive of Hong Kong weather data in southwest England, he traveled there by plane and rail. A bemused archivist led him to a dusty library basement, where Benter copied years of figures into his notebook. When he got back to Hong Kong, he entered the data into his computers—and found it had no effect whatsoever on race outcomes. Such was the scientific process.

    As I said with the lotteryhacking stories: I need to gamble more. This is statistics’ true purpose, right?

  • People relationships in data analysis

    Roger Peng discusses the importance of managing the relationships between people — analyst, patron, subject matter expert, and audience — for a successful analysis:

    Human relationships are unstable, unpredictable, and inconsistent. Algorithms and statistical tools are predictable and in some cases, optimal. But for whatever reason, we have not yet been able to completely characterize all of the elements that make a successful data analysis in a “machine readable” format. We haven’t developed the “institutions” of data analysis that can operate without needing the involvement of specific individuals. Therefore, because we have not yet figured out a perfect model for human behavior, data analysis will have to be done by humans for just a bit longer.

    Whenever someone touts a tool for “automatic insights”, whether it be in analysis or chart generation, something like this comes to mind.