This video shows statistics centered around atheism, claiming that atheism is correlated with a healthy society. I don't want to turn this into a religious debate, but I really don't like these types of videos, slide shows, etc. It's not the ideas that bother me, but because some people think it's a great idea to rattle off a bunch of numbers to "prove" a point. Nevermind the biases, invalid studies, poor analysis, cruddy data, and "results" taken out of context.
As one of the organizers of the event, Life After a Statistics Doctoral Program (a conference organized by the doctoral students in Columbia's Statistics Department), I was excited to be invited to guest post on Nathan's blog but then realized that my perception of the event would be so different than that of an attendee that perhaps I shouldn't. Two post-docs from Columbia's Statistics department, Matt and Kenny, agreed that they would post and they did -- once on Andrew Gelman's blog and once on Nathan's. Continue Reading
This past Friday, Columbia University stat graduate students hosted a symposium on careers for students in statistics. Kenneth Shirley, a stat post doc, was nice enough to write this guest post about the conference so that we can all learn from it. There were two panels - academic and industry - including representation from Google, AT & T, and Pfizer.
Yesterday's conference at Columbia about career opportunities for Statistics Ph.D. graduates was a great success. It was organized by the graduate students in Columbiaâ€™s Stats department and advertised on the web here:
Andrew Gelman made some opening remarks, and then there were two panel discussions, each with five professional statisticians. The first panel consisted of academic statisticians, and the second panel consisted of industry statisticians. Here are some comments I found interesting. Continue Reading
Chris Harrison put together a series of Internet maps that show how cities are interconnected by router configuration. Similar to Aaron Koblin's Flight Patterns, Chris chose to map only the data, which makes an image that looks a lot like strands of silk stretched from city to city. With these maps, viewers gain a sense of connectivity in the world - and as expected the U.S. and Europe are a lot brighter than the rest. Continue Reading
Data are a whole lot of meaningful patterns. We can generate data indefinitely, we can exchange data forever... we can store data, retrieve data and file them away. All this is great fun and maybe useful, maybe lucrative, but we have to ask why. The purpose is regulation and that means translating data into information. Information is what changes us. My purpose is to effect change - to impart information.
Current confirmed speakers include industry statisticians at Google, AT&T Labs-Research, National Institutes of Health, and Pfizer, Inc and academic statisticians from statistics, marketing, and biostatistics departments at Columbia University, University of Pennsylvania and Rutgers University.
The Symposium will be held at Columbia University in New York on April 4, 2008 from 1-5pm. A wine and hors d'oeuvre reception will follow so that there will be ample time to chat informally with our guests, and a student mixer after that is also in the works.
The conference is free and they're offering a $40 travel reimbursement for students who would like to attend. Consider going if you're in the area. It should be interesting. Here's the online registration.
If anyone actually does end up going, let me know. I'd love for you to share your experience here. For the current and future stat PhDs or masters students, what are you doing or planning to do with your degree? Other than framing it, I'm still searching for my answer.
Despite the Academy's efforts to crack down on bootlegging, its attempts haven't done a whole lot. Focus on stopping one area, like downloading, another area just grows more prolific, like Region 5 DVDs from overseas. A quick search in the right places will show you that piracy isn't going away any time soon.
I even met someone whose job it was to find people who were "seeding" films through bit torrents and to report them to police. I got the impression that it was a really tedious process and people go uncaught most of the time. I'm uh, not condoning this, but if you don't want to get caught, just make sure you stop the torrent once you've got your file.
Bootlegging on Seinfeld
Bootlegging always reminds me of the Seinfeld episode when Jerry somehow gets caught up in a bootlegging scheme:
[T]here was a kid couldn't have been more than ten years old. He was asking a street vendor if he had any other bootlegs as good as Death Blow. That's who I care about. The little kid who needs bootlegs, because his parent or guardian won't let him see the excessive violence and strong sexual content you and I take for granted.
For those interested (and I know you are), the term bootleg originates from hiding flasks of liquor in the legging of boots. Ahoy, matey.
In light of the MySpace photo breach (due to their negligence) a couple of months ago, I got to wondering about other recent data breaches. It turns out Attrition.org keeps a Data Loss Archive and Database that contains known data breaches since 2000. Records include date, number affected, groups involved, summaries, and links to reported stories and updates. It's surprisingly detailed and even better, it's all available for download.
The above graphic shows the 10 largest data breaches which affected millions. I thought the 800,000 records thieved from UCLA a couple of years ago (that my information was unfortunately a part of) was a lot. That's nothing compared to these.
Notice the higher frequency as we get closer to the present?
For our Humanflows project, we used the United Nations Common Database for our demographic numbers. Anyone who has used the common database knows that it's not especially user-friendly. You have to go through a series of non-intuitive dropdown menus to get the data you want. You then have to decipher the downloaded data's CSV format. The recently released UNdata relieves a lot of these problems. Continue Reading
I stumbled across a data table from the Social Security Administration that shows the probability of death. It's an actuarial life table estimating the probability that you will die within one year given your age. Continue Reading
I don't think I've seen a single Rambo all the way through nor do I remember the premise of any of the movies, but I still found these kill counts amusing. Notice the near doubling of deaths each sequel. Yo, Adrian!!! Yeah, I know, wrong movie, but come on, is there really a difference?
Here's a graph showing kill counts (mostly for my own entertainment):
Mr. Rambo may have gotten more violent in the latest installment, but it looks like he also grew more modest.
Andrew had some comments about the graphs on Freakonomics that showed a seemingly odd "change of fortune" for Roger Clemens.
You can see that Clemens almost followed an opposite pattern from all other pitchers in the league. As Andrew notes though, there seems to be a lot riding on the quadratic fit and average values when we know that Clemens has been anything but ordinary throughout his long career.
Graphing Without Smoothing
For fun, I tried graphing the ERA data for Clemens against the ERAs for the 16 most recent hall of fame pitchers (that I could get data for). My thinking was the hall-of-famer performances might be a better indicator of what should be "normal" for great pitchers. The results are a little less compelling. However, one thing to note is that most players who played past age 40 saw an increase in ERA while Clemens had a pretty significant improvement in ERA from age 40 to 43.
Whether this is due to performance enhancing drugs or just a change in pitching strategy, coaching, or some other factor, I can't say. There's probably only a few people who can know for sure.
Anyways, if anyone has a different take on the data, I'd love to hear it in the comments.
Women put greater weight on the intelligence and the race of partner, while men respond more to physical attractiveness. Moreover, men do not value women's intelligence or ambition when it exceeds their own. Also, we find that women exhibit a preference for men who grew up in afflÂuent neighborhoods. Finally, male selectivity is invariant to group size, while female selectivity is strongly increasing in group size.
The dataset is substantial with over 8,000 observations for answers to twenty something survey questions. With questions like How do you measure up? and What do you look for in the opposite sex?, this dataset is definitely high on human element and should be fun to play with.
Predictify takes James Surowiecki's The Wisdom of Crowds to heart. Surowiecki argues that when certain factors are present (for example, group diversity), then the group is always smarter than the individual. Predictify has turned this "principle" into a money-making platform. Continue Reading
I just put down $20 on today's game for the New York Giants to cover the 12-point spread. Of course, knowing me, I got to thinking how that betting line is decided. Is there one person who calculates the spread? Do Las Vegas casinos just put up numbers based on past experiences? I did a little bit of research, and here's what I found. Continue Reading
Peter Donnelly talks about the misuse of statistics in his TED talk a couple of years back. The first 2/3 of the talk is an introduction to probability and its role in genetics, which admittedly, didn't get much of my interest. The last third, however, gets a lot more interesting.
Donnelly talks about a British woman who was wrongly convicted largely in part because of a misuse of statistics. A so-called expert cited how improbable it would be for two children to die of sudden infant death syndrome, but it turns out that "expert" was making incorrect assumptions about the data. This doesn't surprise me since it happens all the time.
People misuse statistics every day (intentionally and unintentionally), and oftentimes it doesn't hurt much (which doesn't make it any better), but in this case improper use directly affected someone's life in a very big way. One of the most common assumptions I see is that every observation is independent, which often is not the case. As a simple example, if it's raining today, does that change the probability that it will rain tomorrow? What it didn't rain today?
In other words, the next time you're thinking of making up or tweaking data, don't; and the next time you need to analyze some data but aren't sure how, ask for some help. Statisticians are nice and oh so awesome.
Whenever I tell people that I study Statistics, they almost always respond, "So what do you do with that?" After they get over their initial shock, I often get, "If I were in Statistics, I'd study sports statistics." I usually respond by telling them that while it would probably be a lot of fun, I don't think there is much money in it (because I gotta eat, right?) and that statisticians usually take that as a part time gig. I'm thinking I might have to change that response though, as the game of sports statistics is showing signs of life with the recent Journal of Quantitative Analysis in Sports.
Articles in the Journal of Quantitative Analysis in Sports (JQAS) come from a wide variety of sports and perspectives and deal with such subjects as tournament structure, frequency and occurrence of records and the optimal focus of training for decathlons. Additionally, the journal serves as an outlet for professionals in the sports world to raise issues and ask questions that relate to quantitative sports analysis. Edited by economist Benjamin Alamar, articles come from a diverse set of disciplines including statistics, operations research, economics, psychology, sports management and business.
Maybe I'll read regularly and take up sports betting as my new hobby.
Those of you who have been around since the beginning know that I am just obsessed with my pedometer. Albeit, lately, I haven't felt inclined to go for a winter stroll in the below freezing weather. When I was keeping track of my steps though, one of the difficulties was staying consistent. Sometimes I would forget to wear my pedometer, while other times I would forget to record my steps.
I imagine Walker Tracker could help a bit in solving that second problem. I know it was always easier to make it to the gym when I knew one of my friends was going to meet me there. Walker Tracker is like that friend at the gym. The site lets you keep track of your steps as well as see how others are doing.
We're trying to change the world. We're trying to get you and us and everyone we know off the elevator and out of the car and onto the sidewalks and trails. We're doing it one step at a time.
Get up, stand up and walk.
OK, maybe it's a little hoorah, but if you feel like actually accomplishing a new year's resolution this year, Walker Tracker could be a good place to start.