• What Can You Do With a Degree In Statistics? – A Follow Up

    April 9, 2008  |  Statistics

    This past Friday, Columbia University stat graduate students hosted a symposium on careers for students in statistics. Kenneth Shirley, a stat post doc, was nice enough to write this guest post about the conference so that we can all learn from it. There were two panels - academic and industry - including representation from Google, AT & T, and Pfizer.

    Yesterday's conference at Columbia about career opportunities for Statistics Ph.D. graduates was a great success. It was organized by the graduate students in Columbia’s Stats department and advertised on the web here:

    http://www.stat.columbia.edu/career_conf08/

    Andrew Gelman made some opening remarks, and then there were two panel discussions, each with five professional statisticians. The first panel consisted of academic statisticians, and the second panel consisted of industry statisticians. Here are some comments I found interesting.
    Continue Reading

  • World Internet City-to-City Connections and Density Maps

    April 1, 2008  |  Data Sources, Mapping

    Chris Harrison put together a series of Internet maps that show how cities are interconnected by router configuration. Similar to Aaron Koblin's Flight Patterns, Chris chose to map only the data, which makes an image that looks a lot like strands of silk stretched from city to city. With these maps, viewers gain a sense of connectivity in the world - and as expected the U.S. and Europe are a lot brighter than the rest.
    Continue Reading

  • Translating Data Into Information that Changes Us

    March 28, 2008  |  Statistics

    Wondering what statistics is for? This is what.

    Data are a whole lot of meaningful patterns. We can generate data indefinitely, we can exchange data forever... we can store data, retrieve data and file them away. All this is great fun and maybe useful, maybe lucrative, but we have to ask why. The purpose is regulation and that means translating data into information. Information is what changes us. My purpose is to effect change - to impart information.

    Platform for Change by Stafford Beer

  • What Are You Going to Do With Your PhD in Statistics?

    March 26, 2008  |  Statistics

    OLYMPUS DIGITAL CAMERA

    Statistics graduate students at Columbia University are hosting a symposium on careers for PhDs in statistics.

    Current confirmed speakers include industry statisticians at Google, AT&T Labs-Research, National Institutes of Health, and Pfizer, Inc and academic statisticians from statistics, marketing, and biostatistics departments at Columbia University, University of Pennsylvania and Rutgers University.

    The Symposium will be held at Columbia University in New York on April 4, 2008 from 1-5pm. A wine and hors d'oeuvre reception will follow so that there will be ample time to chat informally with our guests, and a student mixer after that is also in the works.

    The conference is free and they're offering a $40 travel reimbursement for students who would like to attend. Consider going if you're in the area. It should be interesting. Here's the online registration.

    If anyone actually does end up going, let me know. I'd love for you to share your experience here. For the current and future stat PhDs or masters students, what are you doing or planning to do with your degree? Other than framing it, I'm still searching for my answer.

    [via Statistical Modeling]

  • Data is Going to Change How and Where You Drive – Dash GPS Navigation

    March 24, 2008  |  Statistics

    map

    Dash, an Internet-connected GPS device, is going to change the way you drive by making use of traffic data. Where does the data come from? Well, that's the best part.
    Continue Reading

  • Six Years of Piracy Data Available for Download – Shiver Me Timbers

    March 18, 2008  |  Data Sources

    bootleg-china

    I stumbled across this dataset covering piracy of Oscar-nominated films over the last 6 years and a short analysis.

    Piracy by the NumbersDespite the Academy's efforts to crack down on bootlegging, its attempts haven't done a whole lot. Focus on stopping one area, like downloading, another area just grows more prolific, like Region 5 DVDs from overseas. A quick search in the right places will show you that piracy isn't going away any time soon.

    I even met someone whose job it was to find people who were "seeding" films through bit torrents and to report them to police. I got the impression that it was a really tedious process and people go uncaught most of the time. I'm uh, not condoning this, but if you don't want to get caught, just make sure you stop the torrent once you've got your file.

    Bootlegging on Seinfeld

    Bootlegging always reminds me of the Seinfeld episode when Jerry somehow gets caught up in a bootlegging scheme:

    [T]here was a kid couldn't have been more than ten years old. He was asking a street vendor if he had any other bootlegs as good as Death Blow. That's who I care about. The little kid who needs bootlegs, because his parent or guardian won't let him see the excessive violence and strong sexual content you and I take for granted.

    For those interested (and I know you are), the term bootleg originates from hiding flasks of liquor in the legging of boots. Ahoy, matey.

    Photo by mumelopics

  • 10 Largest Data Breaches Since 2000 – Millions Affected

    March 14, 2008  |  Data Sources

    In light of the MySpace photo breach (due to their negligence) a couple of months ago, I got to wondering about other recent data breaches. It turns out Attrition.org keeps a Data Loss Archive and Database that contains known data breaches since 2000. Records include date, number affected, groups involved, summaries, and links to reported stories and updates. It's surprisingly detailed and even better, it's all available for download.

    The above graphic shows the 10 largest data breaches which affected millions. I thought the 800,000 records thieved from UCLA a couple of years ago (that my information was unfortunately a part of) was a lot. That's nothing compared to these.

    Notice the higher frequency as we get closer to the present?

    [Thanks Ryan | Welcome, Boing Boing readers]

  • A World of Information – United Nations Data Just Became Accessible

    March 7, 2008  |  Data Sources

    United Nations Data LogoFor our Humanflows project, we used the United Nations Common Database for our demographic numbers. Anyone who has used the common database knows that it's not especially user-friendly. You have to go through a series of non-intuitive dropdown menus to get the data you want. You then have to decipher the downloaded data's CSV format. The recently released UNdata relieves a lot of these problems.
    Continue Reading

  • Estimate Financial Impact of Risk and Uncertainty for a Living

    March 1, 2008  |  Data Sources

    I stumbled across a data table from the Social Security Administration that shows the probability of death. It's an actuarial life table estimating the probability that you will die within one year given your age.
    Continue Reading

  • Rambo Kill Counts From Parts I, II, III, and IV

    February 22, 2008  |  Data Sources

    rambo-kill-chart

    I don't think I've seen a single Rambo all the way through nor do I remember the premise of any of the movies, but I still found these kill counts amusing. Notice the near doubling of deaths each sequel. Yo, Adrian!!! Yeah, I know, wrong movie, but come on, is there really a difference?

    Here's a graph showing kill counts (mostly for my own entertainment):

    Rambo Kill Counts Graph

    Mr. Rambo may have gotten more violent in the latest installment, but it looks like he also grew more modest.

    [via Geekstir]

  • Comparing Roger Clemens to Hall of Fame Pitchers

    February 11, 2008  |  Statistics

    Andrew had some comments about the graphs on Freakonomics that showed a seemingly odd "change of fortune" for Roger Clemens.

    Roger Clemens - NYT

    You can see that Clemens almost followed an opposite pattern from all other pitchers in the league. As Andrew notes though, there seems to be a lot riding on the quadratic fit and average values when we know that Clemens has been anything but ordinary throughout his long career.

    Graphing Without Smoothing

    For fun, I tried graphing the ERA data for Clemens against the ERAs for the 16 most recent hall of fame pitchers (that I could get data for). My thinking was the hall-of-famer performances might be a better indicator of what should be "normal" for great pitchers. The results are a little less compelling. However, one thing to note is that most players who played past age 40 saw an increase in ERA while Clemens had a pretty significant improvement in ERA from age 40 to 43.

    Whether this is due to performance enhancing drugs or just a change in pitching strategy, coaching, or some other factor, I can't say. There's probably only a few people who can know for sure.

    Anyways, if anyone has a different take on the data, I'd love to hear it in the comments.

  • Speed Dating Data – Attractiveness, Sincerity, Intelligence, Hobbies

    February 6, 2008  |  Data Sources

    In their paper Gender Differences in Mate Selection: Evidence from a Speed Dating Experiment, Fisman et al. had a bit of fun with a speed dating dataset. Here's what they found:

    Women put greater weight on the intelligence and the race of partner, while men respond more to physical attractiveness. Moreover, men do not value women's intelligence or ambition when it exceeds their own. Also, we find that women exhibit a preference for men who grew up in affl­uent neighborhoods. Finally, male selectivity is invariant to group size, while female selectivity is strongly increasing in group size.

    The dataset is substantial with over 8,000 observations for answers to twenty something survey questions. With questions like How do you measure up? and What do you look for in the opposite sex?, this dataset is definitely high on human element and should be fun to play with.

    [via Statistical Modeling]

  • Tap Into the Wisdom of Crowds, Make Money by Predicting Future Events

    February 5, 2008  |  Social Data Analysis

    Predictify LogoPredictify takes James Surowiecki's The Wisdom of Crowds to heart. Surowiecki argues that when certain factors are present (for example, group diversity), then the group is always smarter than the individual. Predictify has turned this "principle" into a money-making platform.
    Continue Reading

  • Who’s Going to Win Super Bowl XLII?

    February 3, 2008  |  Statistics

    I just put down $20 on today's game for the New York Giants to cover the 12-point spread. Of course, knowing me, I got to thinking how that betting line is decided. Is there one person who calculates the spread? Do Las Vegas casinos just put up numbers based on past experiences? I did a little bit of research, and here's what I found.
    Continue Reading

  • Weekend Minis – Government, Environment & Angry Employee

    February 2, 2008  |  Data Sources

    FedStats - Provides access to the full range of official statistical information produced by the Federal Government, including population, eduction, crime, and health care.

    MAPLight - A detailed database that brings together information on campaign contributions and votes in the California legislature. Check out the video tour.

    EarthTrends - A collection of information regarding the environmental, social, and economic trends that shape our world.

    Angry Employee Deletes All of Company's Data - A woman about to "lose" her job goes to the office at night and deletes 7 years' worth of data. Can we say backup, please?

  • Bad Statistics Leads to Poor Results and a Questionable Trial Verdict

    February 1, 2008  |  Mistaken Data

    Peter Donnelly talks about the misuse of statistics in his TED talk a couple of years back. The first 2/3 of the talk is an introduction to probability and its role in genetics, which admittedly, didn't get much of my interest. The last third, however, gets a lot more interesting.

    Donnelly talks about a British woman who was wrongly convicted largely in part because of a misuse of statistics. A so-called expert cited how improbable it would be for two children to die of sudden infant death syndrome, but it turns out that "expert" was making incorrect assumptions about the data. This doesn't surprise me since it happens all the time.

    Lesson Learned

    People misuse statistics every day (intentionally and unintentionally), and oftentimes it doesn't hurt much (which doesn't make it any better), but in this case improper use directly affected someone's life in a very big way. One of the most common assumptions I see is that every observation is independent, which often is not the case. As a simple example, if it's raining today, does that change the probability that it will rain tomorrow? What it didn't rain today?

    In other words, the next time you're thinking of making up or tweaking data, don't; and the next time you need to analyze some data but aren't sure how, ask for some help. Statisticians are nice and oh so awesome.

    Here's Donnelly's talk:

  • Journal of Quantitative Analysis in Sports is Live

    January 30, 2008  |  Statistics

    basketball-rounded

    Whenever I tell people that I study Statistics, they almost always respond, "So what do you do with that?" After they get over their initial shock, I often get, "If I were in Statistics, I'd study sports statistics." I usually respond by telling them that while it would probably be a lot of fun, I don't think there is much money in it (because I gotta eat, right?) and that statisticians usually take that as a part time gig. I'm thinking I might have to change that response though, as the game of sports statistics is showing signs of life with the recent Journal of Quantitative Analysis in Sports.

    Articles in the Journal of Quantitative Analysis in Sports (JQAS) come from a wide variety of sports and perspectives and deal with such subjects as tournament structure, frequency and occurrence of records and the optimal focus of training for decathlons. Additionally, the journal serves as an outlet for professionals in the sports world to raise issues and ask questions that relate to quantitative sports analysis. Edited by economist Benjamin Alamar, articles come from a diverse set of disciplines including statistics, operations research, economics, psychology, sports management and business.

    Maybe I'll read regularly and take up sports betting as my new hobby.

  • Walker Tracker – A Community Site for Pedometer Fans

    January 23, 2008  |  Data Sharing

    Those of you who have been around since the beginning know that I am just obsessed with my pedometer. Albeit, lately, I haven't felt inclined to go for a winter stroll in the below freezing weather. When I was keeping track of my steps though, one of the difficulties was staying consistent. Sometimes I would forget to wear my pedometer, while other times I would forget to record my steps.

    I imagine Walker Tracker could help a bit in solving that second problem. I know it was always easier to make it to the gym when I knew one of my friends was going to meet me there. Walker Tracker is like that friend at the gym. The site lets you keep track of your steps as well as see how others are doing.

    We're trying to change the world. We're trying to get you and us and everyone we know off the elevator and out of the car and onto the sidewalks and trails. We're doing it one step at a time.
    Get up, stand up and walk.

    OK, maybe it's a little hoorah, but if you feel like actually accomplishing a new year's resolution this year, Walker Tracker could be a good place to start.

    [via Web Worker Daily]

  • Google Decides to Host a Whole Lot of Scientific Data – Palimpsest Project

    January 21, 2008  |  Data Sources

    Google ResearchIn its continued efforts for absolute power over all information ever created in the world, Google will be hosting open-source scientific datasets at its research section. Here are the presentation slides from Google's Jon Trowbridge:

    In the next few weeks, terabytes of data will be made available to the public. For example, all 120 terabytes of Hubble Space Telescope data is going to be online. That's kind of cool but kind of scary too. Such a large amount of data is bound to affect lots of people on many different levels.

    For scientists, data will be available for deeper research. For the scientists who generated the data, their research could be placed under more critical scrutiny. Existing data applications might be eclipsed by the data giant, or it could go the other way such that the general public grows more aware of data-type things. Mashups will in turn spring up as well as more visualization, I am sure.

    All of this Doesn't Matter If...

    Of course, all of this depends on what data end up on the Google servers and how easily accessible the data are. Knowing Google, I don't think accessibility will be a problem. Getting data will be the super hard part. Who will be willing to contribute their data? What type of data will get contributed? Will it be the good, raw data or more cleaned and processed data? Do researchers even want to share their data with the rest of the world?

    It's going to be interesting to see what goes up on Google Research in these coming weeks.

    [via Wired and Pimm]

  • Iraq Body Count: A Human Security Project

    January 17, 2008  |  Data Sources

    Iraq Body CountIraq Body Count keeps track of civilian deaths by cross checking media reports and hospital, morgue, and NGO figures. Along with a widget counter that you can post on your blog or site, IBC also makes their database available for download.

    Systematically extracted details about deadly incidents and the individuals killed in them are stored with every entry in the database. The minimum details always extracted are the number killed, where, and when.

    The data comes in two sets -- incident reports and individuals who have lost their lives -- in the form of CSV files.

    Albeit, the data is a little depressing, but still very necessary.

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.