• Emily Robinson gives advice on applying for a data science job (that you can likely generalize for most tech jobs). For example:

    If you have a GitHub, pin the repos you want people to see and add READMEs that explain what the project is. I also strongly recommend creating a blog to write about data science, whether it’s projects you’ve worked on, an explanation of a machine learning method, or a summary of a conference you attended.

    This is especially true for visualization-heavy jobs. It doesn’t have to be GitHub. You just need a place where others can see your collection of work, so that they can see if it aligns with what they’re looking for. Plus it lets you show off your best stuff.

    And this:

    Rather than applying to every type of data science job you find, think about where you want to specialize. A distinction I’ve found helpful when thinking of my own career and looking at jobs is the Type A vs. Type B data scientist. “A” stands for analysis: type A data scientists have strong statistics skill and the ability to work with messy data and communicate results. “B” stands for build: type B data scientists have very strong coding skills, maybe have a background in software engineering, and focus on putting machine learning models, such as recommendation systems, into production.

    I’ve never formally interviewed for a data science job, and the last job I interviewed for was back in college I think. So I’m one of the worst people to ask about this stuff, but this seems like good advice.

  • Popular songs on the Billboard charts always tended to sound similar, but these days they’re sounding even more similar. Andrew Thompson and Matt Daniels for The Pudding make the case:

    From 2010-2014, the top ten producers (by number of hits) wrote about 40% of songs that achieved #1 – #5 ranking on the Billboard Hot 100. In the late-80s, the top ten producers were credited with half as many hits, about 19%.

    In other words, more songs have been produced by fewer and fewer topline songwriters, who oversee the combinations of all the separately created sounds. Take a less personal production process and execute that process by a shrinking number of people and everything starts to sound more or less the same.

  • Visualization is often described in the context of speed and efficiency. Get the most insight for the least amount of ink or pixels. Elijah Meeks argues that visualization goes far beyond this point of view:

    This breakneck pace is a real data visualization constraint. It’s not a myth that charts are often deployed in rooms full of people who only have a short time to comprehend them (or not) and make a decision. Automatic views into datasources are a critical aspect of exploratory data analysis and health checks. The fast mode of data visualization is real and important, but when we let it become our only view into what data visualization is, we limit ourselves in planning for how to build, support and design data visualization. We limit not only data visualization creators but also data visualization readers.

    In the three-parter, Meeks tries to make the fuzzy aspects of visualization — meaning, insight, impact, etc. — more concrete.

    See also:

    Note the dates on all of them. We’ll figure out this visualization thing one of these days.

  • Many surveillance apps cater to parents who want to keep tabs on their children who have mobile phones. Many of these apps are used for less parental purposes. Jennifer Valentino-DeVries for The New York Times reports:

    More than 200 apps and services offer would-be stalkers a variety of capabilities, from basic location tracking to harvesting texts and even secretly recording video, according to a new academic study. More than two dozen services were promoted as surveillance tools for spying on romantic partners, according to the researchers and reporting by The New York Times. Most of the spying services required access to victims’ phones or knowledge of their passwords — both common in domestic relationships.

  • Amazon’s Rekognition is a video analysis system that promises to identify individuals in real-time. Amazon wants to sell the systems to governments for surveillance.

    From the ACLU:

    Amazon is marketing Rekognition for government surveillance. According to its marketing materials, it views deployment by law enforcement agencies as a “common use case” for this technology. Among other features, the company’s materials describe “person tracking” as an “easy and accurate” way to investigate and monitor people. Amazon says Rekognition can be used to identify “people of interest,” raising the possibility that those labeled suspicious by governments — such as undocumented immigrants or Black activists — will be seen as fair game for Rekognition surveillance. It also says Rekognition can monitor “all faces in group photos, crowded events, and public places such as airports,” at a time when Americans are joining public protests at unprecedented levels.

    Given the millions of Alexa-enabled devices in people’s homes and customer purchase histories available on-demand, this feels like a bad idea. Also, creepy. Probably because of the ‘k’ in Rekognition.

  • Wow your friends during the game with random win percentages, based on various player stats.

  • Ken Auletta for The New Yorker looks at “math men” replacing the Mad Men:

    Engineers and data scientists vacuum data. They see data as virtuous, yielding clues to the mysteries of human behavior, suggesting efficiencies (including eliminating costly middlemen, like agency Mad Men), offering answers that they believe will better serve consumers, because the marketing message is individualized. The more cool things offered, the more clicks, the more page views, the more user engagement. Data yield facts and advance a quest to be more scientific—free of guesses. As Eric Schmidt, then the executive chairman of Google’s parent company, Alphabet, said at the company’s 2017 shareholder meeting, “We start from the principles of science at Google and Alphabet.”

    I know the big tech companies is where the money is at, but I hope you young statisticians out there consider the other possibilities. Your skills are valued in many places.

  • Nigel Holmes, the graphic designer known for his playful illustrated graphics, has a new book: Crazy Competitions. It’s exactly what it sounds like.

    Whether it’s flinging frozen rats or parading in holly evergreens, racing snails or carrying wives, human beings have long displayed their creativity in wild, odd, and sometimes just wonderful rituals and competitions. To show what lengths we’ll go to uphold our eccentric customs, British American graphic designer Nigel Holmes channels his belief in the power of hilarity to bring together a bewilderingly funny tour around the globe in search of incredible events, all dryly explained with brilliant infographics in Crazy Competitions.

    Ordered.

  • Food trucks are the real deal these days. The best ones serve a specialized menu really well, in a small, focused space. The Washington Post delves into the insides of several of these trucks and how they make the food with very specific equipment.

  • Educate Your Child by Gabrielle LaMarr LeMee uses census data and the school selection process to simulate the steps you might take in choosing your kid’s first school in Chicago.

    The Chicago public school system has a high level of school segregation as a result of parent’ residential and school choices as well as policy decisions that do not encourage integrated neighborhoods and schools.

    In this game, you are a parent of a 5-year-old child and now you have to make some decisions. Explore how your choices can have an impact on your child’s education and on the overall education of the city’s children.

    There should be more games like this based on census data. It seems to be a good way for an individual to latch on to data points while still getting a view of the grand scheme of things.

    See also more on LeMee’s design for details on modeling school choice.

  • Imagine that those with immigrants in their family tree left the country. Almost everyone, basically.

  • Adam Pearce for The New York Times describes the sad state of affairs that is the delayed subway trains in New York. One delay causes a ripple effect down the line, leaving little chance to get back on track. The more straightforward figures gear you up for the overall view at the end.

    This was for New York specifically but is applicable to other transits and forms of transportation. See also the traffic gridlock simulation from a few years ago. It doesn’t take much for gridlock.

  • Financial Times recently updated their style guide:

    data — the rule for always using data as plural has been relaxed. If you read data as singular then write it as such. For example, we already allow singular for ‘big data’. And we should for personal data too. An easy rule would be that if it can be used as a synonym for information then it should probably be singular — and if we are using it as economic data and mean figures, then we should stick to plural.

    And for kicks, I dug up my New York Times style guide from 1999:

    data is acceptable as a singular term for information: The data was persuasive. In its traditional sense, meaning a collection of facts and figures, the noun can still be plural: They tabulate the data, which arrive from bookstores nationwide. (In this sense, the singular is datum, a word both stilted and deservedly obscure.)

    Data are sounds weird to me most of the time. When I say it like that, I feel like I should also drink a cup of tea with my pinky sticking out and a monocle firmly planted for distinction.

  • CBS News picked up four used photocopiers and looked at the hard drives. There was a lot of private information stored in them:

    Nearly every digital copier built since 2002 contains a hard drive – like the one on your personal computer – storing an image of every document copied, scanned, or emailed by the machine.

    In the process, it’s turned an office staple into a digital time-bomb packed with highly-personal or sensitive data.

    If you’re in the identity theft business it seems this would be a pot of gold.

    “The type of information we see on these machines with the social security numbers, birth certificates, bank records, income tax forms,” John Juntunen said, “that information would be very valuable.”

    Okay, save the dramatics, it’s still disconcerting.

    Not every photocopier makes it so easy to access copied documents, but it’s surprising that it’s still so straightforward with some machines these days. Then again, part of the responsibility belongs to the previous owners. As the Federal Trade Commission instructs, it’s like getting rid of a computer.

  • Simone Giertz, bringer of joy and self-described expert in shitty robots, makes machines that succeed in failing. In her TED talk, Giertz talks about her path from “useless” things to expert. It’s all the more relevant after she found out she has a brain tumor.

    Read More

  • Eric Rodenbeck from Stamen Design discusses visualization the medium over visualization the tool or the insight-providing image:

    Dataviz! Data visualization! I don’t think it’s for anything! I don’t believe it’s meaningful to say that dataviz is for one thing, any more than it’s meaningful to say that architecture is for any one thing. Or that photography is for one thing, that it has a purpose that can be defined in a sentence or two. Or that movies are for one thing, one that you could win an argument about.

    Yes.

    See also Martin Wattenberg and Fernanda Viegas’ talk from a while back on the parallels between books and visualization.

  • The 2020 Census is coming up quick, but there’s still a lot up in the air. There’s no director, the bureau has to adjust to budget cuts, and a new digital system that promises to save money hasn’t been fully tested (because of lower funding). Exciting. Alvin Chang for Vox explains in more detail — with cartoons.

  • Most of us don’t read the terms and conditions before we click on “I agree” for the web services we use. They’re too long, and we need likes right away. For a student project, Dima Yarovinsky printed the terms and conditions on paper for major social apps — WhatsApp, Google, Tinder, Twitter, Facebook, Snapchat, and Instagram, respectively — which highlights what we’re getting into. [via @hailmika]

  • Members Only

    Moving your data from the digital screen to something more physical isn’t as tricky as it seems. Here’s how I did it.

  • By Neil Freeman, the @everytract bot on Twitter, as the name suggests, is tweeting a map of every Census tract in numerical order. It’s one map each half hour.

    Census data, or data in general really, is typically in aggregate or about the overall trends, which requires an abstract view of a bunch of data points pushed together. So it’s nice to see a straightforward project put focus on the individual.

    Of this genre, the censusAmericans bot is my favorite. It tweets people’s biographies based on data from the American Community Survey.