• What Do People Want to Do With Their Lives?

    June 17, 2008  |  Data Sources, Projects, Visualization

    43things-viz

    43 Things is a goal-setting community where people set goals, cheer each other on, and connect with others who are trying to achieve the same thing. Even if you're not setting goals yourself, it's still interesting and often amusing to see what others have set out to do e.g. go skinny dipping, have a one night stand, and be myself.
    Continue Reading

  • Our Non-ability to Misunderstand Statistics of Rare Events

    June 4, 2008  |  Statistics

    The DiceCory Doctorow from The Guardian writes about our inability to understand the statistics of rare events. We obsess so much over the near-impossible probability that something could happen that it clouds our vision of more probable events.

    The rare - and the lurid - loom large in our imagination, and it's to our great detriment when it comes to our safety and security. As a new father, I'm understandably worried about the idea of my child falling victim to some nefarious predator Out There, waiting to break in and take my child away. There's a part of me who understands the panicked parent who rings 999 when he sees some street photographer aiming a lens at a kids' playground.

    But the fact is that attacks by strangers are so rare as to be practically nonexistent. If your child is assaulted, the perpetrator is almost certainly a relative (most likely a parent). If not a relative, then a close family friend. If not a close family friend, then a trusted authority figure.

    Says Doctorow, such misunderstanding is why we gamble in casinos and why we have to wait in long security lines at the airport. We see piles of money and terrorist attacks when ultimately, the chances that you'll win a jackpot or pass over violence is much less likely - near impossible - compared to losing all of your money and losing valuables to a curious luggage handler.

    If there's one thing the government and our educational institutions could do to keep us safer, it's this: teach us how statistics works.

    Amen to that.

    [Thanks, Jan]

  • Statistics is a Diverse Field With Different Paths of Study

    May 26, 2008  |  Statistics

    Rows in a Field
    Photo by Duncan H

    One of the huge factors that drew me in to statistics is that you can apply it to so many different areas of study. When someone asks me what the job market is like for someone in statistics, I always tell them, "Wherever there's data, there's a job to fill by a statistician. Marketing, biology, traffic, finance, crime..."

    It's also my way of answering, "What are you going to do when you graduate?" In other words, I'm not sure yet. I keep running into more and more fun stuff I can do with my degree so it's hard to decide right now. But hey, it's better to have too many paths to choose from that not enough, right?

    Interdisciplinary Statistics

    In the most recent Amstat News is a short article - Statistics as an Interdisciplinary Science:

    An issue touched on briefly is statistics as an interdisciplinary science. I think there is a general agreement that (almost) all other scientific disciplines need statistics (and statisticians).

    Speaking to people outside of the field, there's this idea that statistics is very focused (which it is in some ways, I guess) and very narrow, but it's pretty much whatever you want it to be. You can focus completely on say, crime, or you can be more broad and examine issues in social science, for example.

    It's like design or computer science. You might use your skills for very specific areas like page layout or web programming, but just as easily, you could use that know how on a broad range of projects.

    In summary, statistics is awesome. What have you used statistics for lately?

  • U.S. Census Bureau’s 2008 Statistical Abstract – Looking at America’s Data

    May 21, 2008  |  Data Sources

    The U.S. Census Bureau released their 2008 Statistical Abstract, the National Data Book, not too long ago (um, like in January). There are state rankings and data in 30 categories and many more sub-categories. All this data is in the form of PDFs and Excel spreadsheets, which doesn't lend much to readability, but still, it's nice to have access to all the information.

    Maybe FlowingData readers can put together a giant statistical abstract all conveyed through graphics. That would be cool. Above are six data sets that I picked from the billion or so available.

  • The Safest Seat to Sit In On a Plane is…

    May 20, 2008  |  Statistics

    Safest seat to sit on a plane

    Popular Mechanics did a study on where it was safest to sit on an airplane based on all commercial jet crashes since 1971. Contrary to expert statements that "one seat is safe as the other," the study found that it is safer to sit in the back.

    The funny thing about all those expert opinions: They're not really based on hard data about actual airline accidents. A look at real-world crash stats, however, suggests that the farther back you sit, the better your odds of survival. Passengers near the tail of a plane are about 40 percent more likely to survive a crash than those in the first few rows up front.

    The percentages in the above graphic are survival rates.

    [Thanks, Tim]

  • Why Did Andy Dufresne Escape from Shawshank?

    May 8, 2008  |  Statistics

    If I were to skip straight to the part in The Shawshank Redemption when Andy Durfesne climbs out of the pipe of poo (and put it on mute), someone who never saw the movie might see an escaped convict who steals money from a warden and fleas to some random place in Mexico called Zihuatanejo. Out of grief, the warden kills himself and Ellis Boyd "Red" Redding eventually teams up with Andy to commit more crimes.

    Those of us who have seen the movie though know this isn't the case. Why? Because we saw the whole movie and have context.

    Context Matters

    As Andrew, a FlowingData reader, put it, "For statistics to be useful, it needs to be explained in a context." When I get my hands on some data, whether I'm analyzing or visualizing, I want to know the context of data first. I want to know who collected the data, how it was collected, when it was collected, and what was done to it before it arrived in my hands. Without that meta-information, I could easily make an incorrect assumption about the data or misrepresent it somehow in a visualization - which is very bad.

    Simply put, we use visualization and statistics to tell stories with data. If we don't have all the information, then we can't tell a complete story.

  • Data and Statistics For Human Rights

    April 27, 2008  |  Statistics

    Patrick BallPatrick Ball, a human rights statistician, finds truth in numbers while analyzing and consulting to find patterns and uncover scale in crimes against humanity.

    The tension started in the witness room. "You could feel the stress rolling off the walls in there," Patrick Ball remembers. "I can remember realizing that this is why lawyers wear sport coats – you can't see all the sweat on their arms and back." He was, you could say, a little nervous to be cross-examined by Slobodan Milosevic.

    Mr. Ball was the first expert witness called in the case against the former Serbian president, who was representing himself against mass atrocity charges at the International Criminal Tribunal for Yugoslavia. Ball had spent 10 months crunching numbers about migration patterns in the former Yugoslav province of Kosovo; his findings suggested that hundreds of thousands of refugees who fled to Albania were spurred by the violence of Mr. Milosevic's army. By the time Ball entered the tribunal chamber, in March 2002, the ousted leader had a reputation for grand orations rather than direct questions; when Milosevic veered off track, the judge would interrupt. "Milosevic would say, 'Dobro,' and go on...." Ball remembers. "It means, 'OK, very well,' but it was clearly a, 'Very well, we'll have you shot later.' I hear [that] in my dreams periodically."

    Ball is a statistician – not exactly a profession usually associated with human rights defense. But the Human Rights Data Analysis Group that he heads at Benetech, a technology company with a social justice focus, is bringing the power of quantitative analysis to a field otherwise full of anecdote.

    That's right. Statistics is awesome. I dare you to disagree.

    [via Statistical Modeling]

  • Atheist Statistics For 2008 – Do You Believe These?

    April 16, 2008  |  Mistaken Data

    This video shows statistics centered around atheism, claiming that atheism is correlated with a healthy society. I don't want to turn this into a religious debate, but I really don't like these types of videos, slide shows, etc. It's not the ideas that bother me, but because some people think it's a great idea to rattle off a bunch of numbers to "prove" a point. Nevermind the biases, invalid studies, poor analysis, cruddy data, and "results" taken out of context.

    What do you think? Do you buy this stuff?

  • Reflecting on Life After Statistics – R.I.P. Minghui Yu

    April 12, 2008  |  Statistics

    Rachel, one of the organizers of Columbia's Life After Statistics, reflects on lessons learned from the conference and gives respects to a fellow statistician who was lost the night of.

    As one of the organizers of the event, Life After a Statistics Doctoral Program (a conference organized by the doctoral students in Columbia's Statistics Department), I was excited to be invited to guest post on Nathan's blog but then realized that my perception of the event would be so different than that of an attendee that perhaps I shouldn't. Two post-docs from Columbia's Statistics department, Matt and Kenny, agreed that they would post and they did -- once on Andrew Gelman's blog and once on Nathan's.
    Continue Reading

  • What Can You Do With a Degree In Statistics? – A Follow Up

    April 9, 2008  |  Statistics

    This past Friday, Columbia University stat graduate students hosted a symposium on careers for students in statistics. Kenneth Shirley, a stat post doc, was nice enough to write this guest post about the conference so that we can all learn from it. There were two panels - academic and industry - including representation from Google, AT & T, and Pfizer.

    Yesterday's conference at Columbia about career opportunities for Statistics Ph.D. graduates was a great success. It was organized by the graduate students in Columbia’s Stats department and advertised on the web here:

    http://www.stat.columbia.edu/career_conf08/

    Andrew Gelman made some opening remarks, and then there were two panel discussions, each with five professional statisticians. The first panel consisted of academic statisticians, and the second panel consisted of industry statisticians. Here are some comments I found interesting.
    Continue Reading

  • World Internet City-to-City Connections and Density Maps

    April 1, 2008  |  Data Sources, Mapping

    Chris Harrison put together a series of Internet maps that show how cities are interconnected by router configuration. Similar to Aaron Koblin's Flight Patterns, Chris chose to map only the data, which makes an image that looks a lot like strands of silk stretched from city to city. With these maps, viewers gain a sense of connectivity in the world - and as expected the U.S. and Europe are a lot brighter than the rest.
    Continue Reading

  • Translating Data Into Information that Changes Us

    March 28, 2008  |  Statistics

    Wondering what statistics is for? This is what.

    Data are a whole lot of meaningful patterns. We can generate data indefinitely, we can exchange data forever... we can store data, retrieve data and file them away. All this is great fun and maybe useful, maybe lucrative, but we have to ask why. The purpose is regulation and that means translating data into information. Information is what changes us. My purpose is to effect change - to impart information.

    Platform for Change by Stafford Beer

  • What Are You Going to Do With Your PhD in Statistics?

    March 26, 2008  |  Statistics

    OLYMPUS DIGITAL CAMERA

    Statistics graduate students at Columbia University are hosting a symposium on careers for PhDs in statistics.

    Current confirmed speakers include industry statisticians at Google, AT&T Labs-Research, National Institutes of Health, and Pfizer, Inc and academic statisticians from statistics, marketing, and biostatistics departments at Columbia University, University of Pennsylvania and Rutgers University.

    The Symposium will be held at Columbia University in New York on April 4, 2008 from 1-5pm. A wine and hors d'oeuvre reception will follow so that there will be ample time to chat informally with our guests, and a student mixer after that is also in the works.

    The conference is free and they're offering a $40 travel reimbursement for students who would like to attend. Consider going if you're in the area. It should be interesting. Here's the online registration.

    If anyone actually does end up going, let me know. I'd love for you to share your experience here. For the current and future stat PhDs or masters students, what are you doing or planning to do with your degree? Other than framing it, I'm still searching for my answer.

    [via Statistical Modeling]

  • Data is Going to Change How and Where You Drive – Dash GPS Navigation

    March 24, 2008  |  Statistics

    map

    Dash, an Internet-connected GPS device, is going to change the way you drive by making use of traffic data. Where does the data come from? Well, that's the best part.
    Continue Reading

  • Six Years of Piracy Data Available for Download – Shiver Me Timbers

    March 18, 2008  |  Data Sources

    bootleg-china

    I stumbled across this dataset covering piracy of Oscar-nominated films over the last 6 years and a short analysis.

    Piracy by the NumbersDespite the Academy's efforts to crack down on bootlegging, its attempts haven't done a whole lot. Focus on stopping one area, like downloading, another area just grows more prolific, like Region 5 DVDs from overseas. A quick search in the right places will show you that piracy isn't going away any time soon.

    I even met someone whose job it was to find people who were "seeding" films through bit torrents and to report them to police. I got the impression that it was a really tedious process and people go uncaught most of the time. I'm uh, not condoning this, but if you don't want to get caught, just make sure you stop the torrent once you've got your file.

    Bootlegging on Seinfeld

    Bootlegging always reminds me of the Seinfeld episode when Jerry somehow gets caught up in a bootlegging scheme:

    [T]here was a kid couldn't have been more than ten years old. He was asking a street vendor if he had any other bootlegs as good as Death Blow. That's who I care about. The little kid who needs bootlegs, because his parent or guardian won't let him see the excessive violence and strong sexual content you and I take for granted.

    For those interested (and I know you are), the term bootleg originates from hiding flasks of liquor in the legging of boots. Ahoy, matey.

    Photo by mumelopics

  • 10 Largest Data Breaches Since 2000 – Millions Affected

    March 14, 2008  |  Data Sources

    In light of the MySpace photo breach (due to their negligence) a couple of months ago, I got to wondering about other recent data breaches. It turns out Attrition.org keeps a Data Loss Archive and Database that contains known data breaches since 2000. Records include date, number affected, groups involved, summaries, and links to reported stories and updates. It's surprisingly detailed and even better, it's all available for download.

    The above graphic shows the 10 largest data breaches which affected millions. I thought the 800,000 records thieved from UCLA a couple of years ago (that my information was unfortunately a part of) was a lot. That's nothing compared to these.

    Notice the higher frequency as we get closer to the present?

    [Thanks Ryan | Welcome, Boing Boing readers]

  • A World of Information – United Nations Data Just Became Accessible

    March 7, 2008  |  Data Sources

    United Nations Data LogoFor our Humanflows project, we used the United Nations Common Database for our demographic numbers. Anyone who has used the common database knows that it's not especially user-friendly. You have to go through a series of non-intuitive dropdown menus to get the data you want. You then have to decipher the downloaded data's CSV format. The recently released UNdata relieves a lot of these problems.
    Continue Reading

  • Estimate Financial Impact of Risk and Uncertainty for a Living

    March 1, 2008  |  Data Sources

    I stumbled across a data table from the Social Security Administration that shows the probability of death. It's an actuarial life table estimating the probability that you will die within one year given your age.
    Continue Reading

  • Rambo Kill Counts From Parts I, II, III, and IV

    February 22, 2008  |  Data Sources

    rambo-kill-chart

    I don't think I've seen a single Rambo all the way through nor do I remember the premise of any of the movies, but I still found these kill counts amusing. Notice the near doubling of deaths each sequel. Yo, Adrian!!! Yeah, I know, wrong movie, but come on, is there really a difference?

    Here's a graph showing kill counts (mostly for my own entertainment):

    Rambo Kill Counts Graph

    Mr. Rambo may have gotten more violent in the latest installment, but it looks like he also grew more modest.

    [via Geekstir]

  • Comparing Roger Clemens to Hall of Fame Pitchers

    February 11, 2008  |  Statistics

    Andrew had some comments about the graphs on Freakonomics that showed a seemingly odd "change of fortune" for Roger Clemens.

    Roger Clemens - NYT

    You can see that Clemens almost followed an opposite pattern from all other pitchers in the league. As Andrew notes though, there seems to be a lot riding on the quadratic fit and average values when we know that Clemens has been anything but ordinary throughout his long career.

    Graphing Without Smoothing

    For fun, I tried graphing the ERA data for Clemens against the ERAs for the 16 most recent hall of fame pitchers (that I could get data for). My thinking was the hall-of-famer performances might be a better indicator of what should be "normal" for great pitchers. The results are a little less compelling. However, one thing to note is that most players who played past age 40 saw an increase in ERA while Clemens had a pretty significant improvement in ERA from age 40 to 43.

    Whether this is due to performance enhancing drugs or just a change in pitching strategy, coaching, or some other factor, I can't say. There's probably only a few people who can know for sure.

    Anyways, if anyone has a different take on the data, I'd love to hear it in the comments.

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.