• Prostitution, GDP, and £1.7 billion due

    Posted to Statistics  |  Tags: , ,

    David Spiegelhalter, professor of public understanding of risk, does some back-of-the-napkin math to describe why recent prostitution estimates for the UK are problematic.

    As always, it's best to do a simple reality check. The ONS assumptions come to around 61,000,000 visits a year. Let's say 50,000,000 are from locals rather than foreign visitors. There are around 27,000,000 men between 18 and 50 in the UK (taking an arbitrary upper limit), so this would mean that on average each of them buys sex twice a year. In fact the latest Natsal survey found that 3.6% of men reported paying for sex in the last 5 years - let's say that means that considerably less than 1,000,000 men a year pay for sex, maybe 500,000. So the ONS assumptions mean that men who pay for sex do so on average twice a week. This seems high.

    The assumptions also mean that the average person working in prostitution is turning over nearly £100,000 a year, which Jolyon from Tax Relief 4 Escorts says is completely implausible, and he should know.

    Spiegelhalter makes a few of his own assumptions in there, but you can see why estimating illegal activity and then using those numbers to calculate gross domestic product can be a challenge.

    If you recall, the gross domestic product for the United Kingdom rose by 5 percent, largely in part due to estimates trying to account for drug sales and prostitution. Given that illegal activity and careful, public record-keeping typically don't go together, the new numbers were rough at best. For prostitution in particular, the numbers from the Office of National Statistics estimated an extra £5.7 billion added to the GDP.

    The problem now is that the United Kingdom, as a member of the European Union, apparently owes £1.7 billion. This is based on gross national income which uses gross domestic product in its equation. Ouch. Consequences.

  • Decline of women in computer science

    NPR spent some time on the subject of the decline of women in computer science. Whereas the the percentage of women in other technical fields rose, the percentage of women in computer science declined, as shown in the chart above. Although it's tough to pinpoint a single factor, the time of decline coincides with when computer were mostly marketed towards boys in the 1980s.

    In the 1990s, researcher Jane Margolis interviewed hundreds of computer science students at Carnegie Mellon University, which had one of the top programs in the country. She found that families were much more likely to buy computers for boys than for girls — even when their girls were really interested in computers.

  • Chess piece survival rates

    On Quora, someone asked, "What are the chances of survival of individual chess pieces in average games?" Oliver Brennan answered by looking at the moves from 2.2 million games.

    The situation with the c-d-e pawns is very interesting. The most survivable central pawn is the White c-pawn (42%), while White's d-pawn is the most doomed of all the chessmen (24%) - more so even than the knights (~26%). There's a pleasing symmetry in the survival rates of the White and Black c- and e-pawns that suggests they're frequently exchanged on the d-file. Bishops survive around 35% of the time, with the kingside bishops slightly more likely to survive than queenside ones.

    Code available if you want to poke. [via Know More]

  • Statistically ignorant

    Posted to Statistics  |  Tags: ,

    Ipsos MORI, primarily a marketing research group I think, released results of their study on public perception of demographics versus reality, on numbers such as immigration, religion, and life expectancy. The key takeaway is that out of the people they polled from fourteen countries, the average person typically over- or underestimated — by a lot.
     Continue Reading 

  • Halloween costume rankings

    Posted to Infographics  |  Tags: , ,

    Accompanying their segment on Halloween stores stocking costumes, NPR ranks bestsellers for the past four years, based on data from the National Retail Foundation. Note that these are rankings for adult costumes, so it's safe to assume that all of these costume names are preceded by "sexy." (Kidding.)

    I'm surprised there aren't more topical costumes towards the top. For example, the segment touches on Walter White costumes flying off the shelves last year, but I'm guessing the data probably only covers the pre-packaged stuff. Also guessing a similar reason for why Superman and Batman aren't counted as generic superhero, or Dracula as vampire.

    See the full graphic on NPR.

  • Breakout detection in R

    Posted to Software  |  Tags: , ,

    Say you have time series data and you want to detect significant changes, but there's also a lot of noise to sift through. Twitter released an open source R package, BreakoutDetection, to help with that.

    Our main motivation behind creating the package has been to develop a technique to detect breakouts which are robust, from a statistical standpoint, in the presence of anomalies. The BreakoutDetection package can be used in wide variety of contexts. For example, detecting breakout in user engagement post an A/B test, detecting behavioral change, or for problems in econometrics, financial engineering, political and social sciences.

    Was a quick installation and worked as expected for me. Twitter has released plenty of open source projects, but I think this is the first R package. Nice.

  • Neurons conversing

    Posted to Network Visualization  |  Tags: ,

    Adam Cohen and his group are using genetically-modified neurons that light up when the cells activate to see the communication between neurons in high detail.

    More from New Scientist:

    Cohen's team is using the technique to compare cells from typical brains with those from people with disorders such as motor neuron disease or amyotrophic lateral sclerosis. Rather than taking a brain sample, they remove some of the person's skin cells and grow them alongside chemicals that rewind the cells into an embryonic-like state. Another set of chemicals is used to turn these stem cells into neurons. "You can recreate something reminiscent of the person's brain in the dish," says Cohen.

    Couple that with super slow motion video. Then patterns.

  • Affordable Care Act progress report

    The New York Times takes a data-centric look at the progress of the Affordable Health Care Act here in the United States. It's a team effort seven-parter describing changes in uninsured percentages, affordability, and changes to the health care industry as a whole. Probably want to save this one for later.

  • Curse of dimensionality, interactive demo

    Jeff Leek was trying to explain the curse of dimensionality and realized that there had to be a better way! Leek's student Prasad Patil cooked up an interactive to demonstrate the curse.

    From Leek:

    I recently was contacted for an interview about the curse of dimensionality. During the course of the conversation, I realized how hard it is to explain the curse to a general audience. One of the best descriptions I could come up with was trying to describe sampling from a unit line, square, cube, etc. and taking samples with side length fixed. You would capture fewer and fewer points. As I was saying this, I realized it is a pretty bad way to explain the curse of dimensionality in words.

    Here's the Wikipedia page on the curse, if you like. Or you can just give Patil's interactive a whirl.

  • Data Fluency is Out Now

    Posted to Data Fluency

    Data Fluency: Empowering Your Organization with Effective Data Communication, by Zach and Chris Gemignani, is the latest addition to the FlowingData book series.

    You can order it now.
     Continue Reading 

  • Job Board, October 2014

    Posted to Job Board

    Looking for a job in data science, visualization, or statistics? There are openings on the board.

    Business Intelligence Analyst for American Speech-Language-Hearing Association in Rockville, Maryland.

    Front End Developer for Seed Scientific in New York.

    Director of Visualization Services for North Carolina State University Libraries in Raleigh, North Carolina.

    Middleweight Designer for Information is Beautiful Studio in Shoreditch, London.

  • New York Times mapmakers

    Posted to Mapping  |  Tags: ,

    When news breaks, maps often accompany stories (or the maps are the story), and cartographers and graphics people have to work quickly. The New York Times does this really well. Cartographer Tim Wallace of the New York Times describes some of the process for Wired. I like the bit about uncertainty.

    They also have to deal with incorporating uncertainty into their maps. A recent map of territory held by ISIS in Iraq and Syria, for example, uses blurry red and yellow shading to indicate regions controlled by ISIS and areas of recurring attacks. The same map uses light grey hatching to indicate sparsely populated regions. "You don't want to put a hard line around that," Wallace said. "It's not like you cross a river and all of a sudden it's sparsely populated."

    When I was over there as a lowly graphics intern years ago, I was always impressed by the map department. Actually, I think the map department had just been combined with graphics to work more closely together. Maybe they split them back up again. Anyways, they sit next to each other, and I was impressed by everyone.

    I'd occasionally make location maps — mostly small stuff with a few dots on them. Then I'd give it to the map department for checking. Their speed and accuracy was always top notch, which was a fine way for me to see how much I had to learn.

  • Visual summary of skateboarding tournament

    Posted to Infographics  |  Tags: ,

    George Murphy visualized the results of this year's skateboarding tournament Battle at the Berrics 7. Even if you don't like or know anything about skateboarding, this is a fun one to scroll through.

    Skaters match up head-to-head in a bracket format, and compete in a style similar to the basketball game of H-O-R-S-E. One person does a trick, and if completed cleanly, the other person has to match. If the second person fails to match, he or she receives a letter. The first person to S-K-A-T-E loses.

    Murphy takes you through the tournament with video clips and transitions through a handful of charts. You see how a match plays out and what individual skaters did. Fun.

  • Moving Past Default Charts

    Posted to Tutorials  |  Tags: , ,

    For static data graphics my workflow typically involves R and Illustrator at varying degrees. I covered the process in Visualize This and provided an introduction on how to do the same with Inkscape, Illustrator's open source counterpart. However, you don't always have to use illustration software to produce more readable graphics.

    You can stay in R, tweak a few variables, and it might be all you need. If not, you can at least get closer to what you want, which makes for less post-editing. In this tutorial you learn what parameters to change to mimic a handful of popular chart styles.
     Continue Reading 

  • F1 racing winners and age

    Posted to Infographics  |  Tags: , ,

    So here's a sport I don't see or hear much about. F1 racing, which requires a different sort of strength and agility than say football or basketball, has a wide range of ages. Drivers can be in their teens. Some are in their late 40s (and successful). Peter Cook visualized the ages and races of drives through F1 racing history, since 1950.

    Each row represents a driver's career, and each color-coded dash in a row represents a race. Colors indicate wins, a trip to the podium, and a top 10 finish.

    My favorite part is the tour on initial load. The interactive points out highlights in the data, such as the youngest, oldest, and drivers of interest.

  • Cynthia Brewer profile

    Posted to Mapping  |  Tags: , ,

    Wired wrote a short profile for Cynthia Brewer, best known for Color Brewer, a tool that provides visually apt color schemes for maps (and charts).

    Brewer has been thinking about these issues since her graduate days at Michigan State. But the idea for Color Brewer grew out of a sabbatical she did with the U.S. Census Bureau, overseeing the atlas that accompanied the 2000 Census. "We were trying to be really systematic with color throughout the atlas," she said. Other mapmakers liked the color sets they developed and began asking for them, and Brewer set up Color Brewer to make them more readily available.

    If you've looked at thematic maps at all, you've likely come across a color scheme from Color Brewer. I wouldn't say it's ubiquitous quite yet, but it's close. I just like how something so widespread came from a couple of people in a room who wanted to streamline the process of putting together the decennial atlas.

  • Your life on Earth

    Posted to Infographics  |  Tags: , ,

    The BBC has a fun piece that shows changes over your lifetime. Enter your date of birth, gender, and height, and you get personalized data nuggets, categorized by how you changed, how the world changed, and how people changed the world during your years on this planet.

    For me: 161 major volcano eruptions, 72 solar eclipses, and a 2.7 billion increase in global population.

    Naturally, as with most global numbers, these are based on estimates from a wide range of sources, so keep that in the back of your mind as you scroll.

  • A healthy versus unhealthy office environment

    Posted to Infographics  |  Tags: , ,

    In an interesting use of the before-and-after slider, this Washington Post graphic by Bonnie Berkowitz and Laura Stanton contrasts an unhealthy office environment against a healthy one.

    As a whole, the graphic represents a full office, and the section is broken into categories for an unhealthy environment on the left and a healthy one on the right. For each section, slide all the way to the left or right to see a fuller picture of the respective habit, covering topics such as ergonomics, hygiene, and air quality.

    FYI: Rats and dead plants send the wrong message to your employees.

  • Data Fluency, Coming Soon

    Posted to Data Fluency

    Data FluencyThere's a new addition to the FlowingData book series on the way. It's called Data Fluency: Empowering Your Organization With Effective Data Communication. It's by the founders of Juice Analytics Zach and Chris Gemignani and is available for pre-order at the major online booksellers. Copies are also making their way to the brick-and-mortars.

    Nice.

    As I assumed the technical editor role for the first time, I'll talk more about the book soon, but Zach and Chris probably sum it up best:

    Our hope is that this book starts a new kind of conversation in the analytics field — one that incorporates the people side as much as the tools, techniques, and technologies. We hope it spurs individuals and organizations to start on a journey toward making data a more useful tool for sharing ideas.

    Pre-order it on Amazon.

  • Map of book subjects on Internet Archive

    The Internet Archive makes millions of digitized books available in the form of scanned pages, and these books are categorized into thousands of subjects. Focusing on book images, Mario Klingemann mapped subjects, based on tag similarity. Browse and discover new reading material.

    This map offers an alternative way to browse the 2,619,833 images contained in the Internet Archive's book collection. It shows 5500 different subjects which have been algorithmically arranged by their thematic relationships. The size of each link resembles the amount of images that are available for that topic. Clicking on a link will open the flickr page containing all the pictures for that subject. Rolling over a link will highlight all the topics that have a direct link with the subject.

    I recommend browsing towards the middle in the medical cluster for some weird, old-school healing techniques.