Statistics

  • Search how phrases have been used via Google Ngram Viewer

    December 20, 2010 to Data Sources, Online Applications  •  Nathan Yau  •  Share on Twitter

    Ngram - kindergarten

    Language changes. Culture changes. And we can see some of these changes via what authors write about in books over the years. Google's Book Ngram Viewer lets you search through this data, and shows a graph similar similar to the output of Google Trends. The above is the trends for nursery school, kindergarten, and child care:

    This shows trends in three ngrams from 1950 to 2000: "nursery school" (a 2-gram or bigram), "kindergarten" (a 1-gram or unigram), and "child care" (another bigram). What the y-axis shows is this: of all the bigrams contained in our sample of books written in English and published in the United States, what percentage of them are "nursery school" or "child care"? Of all the unigrams, what percentage of them are "kindergarten"? Here, you can see that use of the phrase "child care" started to rise in the late 1960s, overtaking "nursery school" around 1970 and then "kindergarten" around 1973. It peaked shortly after 1990 and has been falling steadily since.

    Find anything interesting?
    Continue Reading

  • Right versus wrong bubble size

    December 17, 2010 to Mistaken Data  •  Nathan Yau  •  Share on Twitter

    Subsidize This from Good Magazine

    I was going to post this graphic from Good when it came out, but decided not to. I made the same mistake when I first started out. It was another case of wrongly sized bubbles. But they fixed the problem, so now we can see what a big difference it makes. Continue Reading

  • Data analysis is the future of journalism

    December 8, 2010 to Statistics  •  Nathan Yau  •  Share on Twitter

    Tim Berners-Lee, credited with inventing the Web, says analyzing data is the future of journalism:

    "Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars, and it still might be that you'll do it that way some times.

    "But now it's also going to be about poring over data and equipping yourself with the tools to analyse it and picking out what's interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what's going on in the country."

    The Guardian post focuses on current journalists learning new skills, but what we're also going to see is a new type of person — computer scientists, statisticians, and interaction designers — become the storytellers.

  • Jon Stewart explains Wikileaks’ Cablegate

    December 2, 2010 to Data Sources, News  •  Nathan Yau  •  Share on Twitter

    You've probably already heard and read about Wikileaks' Cablegate. If not, Andy Baio has a fine roundup with significant coverage and events to get you caught up quick. Alternatively, you can watch Jon Stewart and The Daily Show explain in the clip below (slightly NSFW, because it mentions a body part).
    Continue Reading

  • The Joy of Stats with Hans Rosling

    November 30, 2010 to Statistics, Visualization  •  Nathan Yau  •  Share on Twitter

    Hans Rosling on development

    The Joy of Stats, a one-hour documentary, hosted by none other than the charismatic Hans Rosling, explores the growing importance of statistics:

    [W]ithout statistics we are cast adrift on an ocean of confusion, but armed with stats we can take control of our lives, hold our rulers to account and see the world as it really is. What's more, Hans concludes, we can now collect and analyse such huge quantities of data and at such speeds that scientific method itself seems to be changing.

    From the description, it sounds like they'll touch on Crimespotting by Stamen, Google Translation, among other data-driven projects. Whatever they cover, it's bound to be interesting with Rosling at the front.
    Continue Reading

  • How do people use Firefox?

    November 30, 2010 to Data Sources, News  •  Nathan Yau  •  Share on Twitter

    Mozilla Labs just released a bunch of anonymized browsing data for their open data visualization competition:

    This competition is based on Mozilla's own open data program, Test Pilot. Test Pilot is a user research platform that collects structured user data through Firefox. All data is gathered through pre-defined Test Pilot studies, which aim to explore how people use their web browser and the Internet.

    There are two datasets in various formats. The first is browsing behavior from 27,000 users, including on/off private browsing that we saw a few months ago. The second dataset is from 160,000 users and is on how they actually use the Firefox interface.

    Additionally, both sets have survey answers to questions like "How long have you used Firefox?" which could make for some fun and interesting breakdowns.

    The deadline is December 17.

    [Mozilla Labs]

  • Statistics vs. Stories

    November 29, 2010 to Statistics  •  Nathan Yau  •  Share on Twitter

    Professor of Mathematics at Temple University, John Allen Paulos describes the differences between statistics and stories:

    [T]here is a tension between stories and statistics, and one under-appreciated contrast between them is simply the mindset with which we approach them. In listening to stories we tend to suspend disbelief in order to be entertained, whereas in evaluating statistics we generally have an opposite inclination to suspend belief in order not to be beguiled.

    And he concludes:

    The focus of stories is on individual people rather than averages, on motives rather than movements, on point of view rather than the view from nowhere, context rather than raw data. Moreover, stories are open-ended and metaphorical rather than determinate and literal.

    Which way do we go when we start telling stories with data?

    [New York Times via @joandimicco]

  • R is the need-to-know stat software

    November 17, 2010 to Software, Statistics  •  Nathan Yau  •  Share on Twitter

    This Forbes post on the greatness that is R is being passed around by every statistician and his mother today.

    It's not that this type of analysis wasn't possible before — statisticians have existed, and commercial software has been available to support them, for decades. The fact that R is free to use, free to modify, and its source is open to view, extend and improve means students, stock traders-in-training and fantasy football junkies can familiarize themselves with the software. They can write programs against it. They're likely to continue that usage into their professional lives. When they share their work, the community, down the line, benefits. And the virtuous cycle strengthens.

    What's your favorite (graphical) use of R?

  • Recalls for March

    Making recalls and market withdrawals more accessible

    Last week I found out that the FDA has a feed for all product recalls and market withdrawals since 2009...
  • Simple analysis makes Expedia extra $12m

    November 5, 2010 to Statistics  •  Nathan Yau  •  Share on Twitter

    There was a problem on Expedia where a lot of people were choosing their itinerary, entering their information and then dropping off after they clicked on the Buy Now button. It's like getting to the cash register at a store, and the cashier says they can't take your money.

    So analysts took a look and found that the field to enter your company was confusing people, leading to the input of an incorrect address. "After we realised that we just went onto the site and deleted that field — overnight there was a step function [change], resulting in $12m of profit a year, simply by deleting a field."

    Not bad for a little bit of data digging. I hope the analysts got a bonus.

    That said, not every decision has to be driven by data. Balance is good.

    [Silicon via @jpmarcum]

  • Stat concepts to the tune of Gershwin

    October 29, 2010 to Statistics  •  Nathan Yau  •  Share on Twitter

    Stat people will probably find this amusing. For the rest, this might make your head explode. Gurdeep Stephens and Michael Greenacre perform classic songs but use statistical concepts for lyrics. Here's Summertime, originally by George Gershwin, turned into a song about statistical modeling (video below).

    It's summertime,
    Statistical modelling is easy,
    Data are fitting,
    Explained variance is high.
    Your data are rich,
    And your model's good-looking,
    So hush, statisticians, don't you cry...

    Continue Reading

  • Opportunities in Government 2.0

    October 27, 2010 to Data Sources  •  Nathan Yau  •  Share on Twitter

    Vivek Wadhwa talks government data and the (financial) opportunities ripe for the picking:

    What is happening with the opening up of government data is nothing less than a silent revolution. There are literally thousands of new opportunities to improve government and to improve society—and to make a fortune while doing it. Unlike the Web 2.0 space, which is overcrowded, Gov 2.0 is uncharted territory: a new frontier to explore, grow things on, and settle on. It’s fresh soil for unlikely seedling ideas that, if they take root, could lead to very successful ventures. So I encourage entrepreneurs to stake their claims as soon as they can.

    Wait a minute. Hold up. You can do more with government data than awkward dashboards? Bring it.

    [TechCrunch via @ucdatalab]

  • A different analytical wall

    October 14, 2010 to Statistics  •  Nathan Yau  •  Share on Twitter

    In reference to the wall between reporting data and understanding it, Martin Theus proposes a different one:

    Once you start to explore the data, the whole thing stops to be linear but gets to be very iterative, jumping over the wall every now and then. I.e., you may find out that the data cleaning is insufficient, or the model you have in mind needs some other transformation of the data, or you might want to collect additional or other data altogether.

    The wall does exist, but I think it is more separating two kinds of people / thinking.

    Theus finishes:

    One thing is for sure: we won’t succeed if analysts continue to build useful but technically insufficient tools and computer scientists still build fancy tools that merely help the analysts.

    Or even better: analyst and tool builder become the same person. That'll take much longer though, so communication is a good place to start.

    [Theusrus]

  • OkCupid explores gay and straight stereotypes

    October 12, 2010 to Statistics  •  Nathan Yau  •  Share on Twitter

    Online dating site OkCupid dives into their data for 3.2 million users again, this time to explore gay and straight stereotypes. Many are false. Some are true. Among the findings: who's gay curious in the United States and who thinks the earth is bigger than the sun.

  • The simple truth about statistics

    October 10, 2010 to Quicklinks, Statistics  •  Nathan Yau  •  Share on Twitter

    Matt Parker explains why no one should be fooled by a misuse of statistics just like no one was fooled by "I did not have sexual relations with that woman."

  • Human-centric analysis

    September 21, 2010 to Statistics  •  Nathan Yau  •  Share on Twitter

    BI has hit the wall

    Stephen Few recently spoke at Tableau’s Customer Conference, and brought up the above slide. He explains:

    All of the traditional BI software vendors and most of the industry’s thought leaders are stuck on the left side of the wall. The software vendors that are providing effective data sensemaking solutions—those that make it possible to work in the realm of analytics on the right side of the wall—have come from outside the traditional BI marketplace.

    This isn't just business intelligence. It's everywhere that analyzes or works with data. It's also not just the software. It's the person who does the analysis. More so, actually. You have to get over the wall to really get something out of your data. Otherwise, you're just a drone doing a computer's work when it should be the other way around.

  • The real stuff white people like

    September 13, 2010 to Statistics  •  Nathan Yau  •  Share on Twitter

    Stuff white males like according to OkCupid

    Online dating site OkCupid continues their run of amusing yet thorough analysis of their users. This time: the real stuff white people like. Well actually, the stuff that all races like:

    We selected 526,000 OkCupid users at random and divided them into groups by their (self-stated) race. We then took all these people's profile essays (280 million words in total!) and isolated the words and phrases that made each racial group's essays statistically distinct from the others'.

    Top phrase for white males? Tom Clancy. White female? The Red Sox. Black males? Soul food. Black females? Soul food. Asian males? Taiwan. Asian females? Coz. Yeah, I don't know what that is either.

    [Thanks, John]

  • Various ways to rate a college

    September 8, 2010 to Network Visualization, Statistics  •  Nathan Yau  •  Share on Twitter

    Measures for different college ratings

    There are a bunch of college ratings out there to help students decide what college to apply to (and give something for alumni to gloat about). The tough part is that there doesn't seem to be any agreement on what makes a good college. Alex Richards and Ron Coddington describe the discrepancies.
    Continue Reading

  • Simple data converter from Excel

    September 6, 2010 to Online Applications, Statistics  •  Nathan Yau  •  Share on Twitter

    If you've ever created an interactive graphic or anything else that requires that you feed in data, you will love this barebones data conversion tool by Shan Carter. Copy and paste data from Excel, which I feel like I've done a billion times, and then take your pick from Actionscript, JSON, XML, and Ruby. Simple, but a potential time saver. [via]

  • Statistical literacy guides for the basics

    September 3, 2010 to Statistics  •  Nathan Yau  •  Share on Twitter

    Guide to statistical charts - before and after

    You can get pretty far with data graphics with just limited statistical knowledge, but if you want to take your skills, resume, and portfolio to the next level, you should learn standard data practices. Of all places, UK Parliament has some short and free guides to help you with basic statistical concepts. They provide 13 notes, each only two or three pages long that can help you with stuff like how to adjust for inflation, confidence intervals and statistical significance, or basic graph suggestions [pdf]. I like.

    [via | Thanks, @joemako]

Copyright © 2007-2012 by FlowingData. Hosted by Media Temple.