• Access Restrictions on the Release of Gun Sales Data

    October 24, 2007  |  Data Sharing

    I just found this in my draft folder from a while back. It's kind of old news, but I think it's still worth mentioning.

    Gun control advocates failed to gain local government and law enforcement agencies' access to gun sales data.

    The House Appropriations Committee defeated two attempts by gun control advocates to strip four-year-old restrictions on the use of information from Bureau of Alcohol, Tobacco, Firearms and Explosives tracing gun sales. The votes were a victory for the National Rifle Association and came despite the Democratic takeover of Congress in January.

    One side argues that gun sales data will help law enforcement agencies track gun dealers who sell guns illegally. The other side argues that there's privacy at stake, and there's a chance that police officers' identities could be inferred. A big victory for gun rights advocates, or so the the article might suggest.

    My opinion -- even if gun sales data were given to law enforcement, how could anyone guarantee data integrity? I think it's fair to say that dealers selling guns illegally aren't going to provide accurate reports. Sell a gun under the table with cash, don't report it, and the data doesn't reveal much. Am I missing something here?

  • New Journal: Technology Innovations in Statistics Education

    October 16, 2007  |  Statistics

    TISE Journal LogoTechnology Innovations in Statistics Education (TISE) is a new e-journal that was just announced yesterday. The use of technology (e.g. data visualization) has become extremely important in teaching statistical concepts to newbies, and so this new journal will be really useful; computers have allowed students to explore and experiment in ways students couldn't do with just paper and pencil. TISE explores these alternatives.

    Technology Innovations in Statistics Education (TISE) publishes scholarhip on the intersection between technology and statistics education. The current issue includes papers by George Cobb (who challenges the introductory statistics curriculum to radically innovate to adapt to new technology), Beth Chance et. al, (who provide an overview of the use of technology to improve student learning), Wlliam Finzer, et.al, (who describe software innovations for improving student access to data), Dani Ben-Zvi, (preliminary research results on using Wiki in statistics teaching), Daniel Kaplan (on the role of computation in introductory statistics), and Andee Rubin (an historical overview of technology in statistics education.)

    These papers can be read at http://tise.stat.ucla.edu. Please click on the "subscribe button" to join the mailing list to be informed of future released.

    TISE is seeking scholarly papers for Volume 2 that address any of these themes:

    • Designing technology to improve statistics education
    • Using technology to develop conceptual understanding
    • Teaching the use of technology to gain insight into and access to data

    The first issue is already online. Take a look. I've had the opportunity to work with some of the knowledgeable and active members of the editorial board, so TISE looks to be very promising.

  • Education Statistics Free, Available, and Waiting for You

    October 15, 2007  |  Data Sources

    Raw, fine-grain data is still a bit hard to come by. Summary statistics (i.e. data that came from some analysis), on the other hand, are often easy to find. A lot of the time the data is already online or just a simple phone call away.

    The National Center for Education Statistics, a part of the U.S. Department of Education, offers a bunch of data including, but not at all limited to, poverty and math achievement, average science scores overall and by grade level, and quantitative literacy.
    Continue Reading

  • New York Mets Not Looking So Good

    September 28, 2007  |  Statistics

    New York Mets 1986 and 2007

    I've never really been interested in baseball. I've always been more of a basketball and football fan. However, my summer roommate was a die hard baseball fan, and I'm convinced that he brainwashed me into rooting for the New York Mets. Just a couple of weeks ago, someone told me he was a Phillies fan, and I let out a blech of disgust without even thinking about it.

    So with the Mets' most recent loss, I'm a bit disgruntled, and I'm sure my old roommate is pissed as can be. The Mets are no longer leading the Phillies for the number one spot in the NL east.

    What better way to see how poorly the Mets are playing than with a graphic? I decided to compare this year's Met season with the 1986 Met World Series winning season, because that should probably be what they're shooting for. As my roommate would angrily exclaim, "If they can't get their #%&$ act together, they don't serve to go to the playoffs!"

  • Misleading Map of Buffalo Snow

    September 27, 2007  |  Mapping, Mistaken Data

    Buffalo Snowfall Map Without LegendI saw this map of the average snow levels in Buffalo. I think I just glanced at it and that was about it. When you first look at the map, what do you make of the colors? When I see green for snow levels, I think no snow. Am I crazy? What do you think?

    So the image was kind of in my head all this summer while I was in NYC. When I told people that I was going back to Buffalo after my internship, they always gave this look that said, "Ha, have fun during the winter," and then they would actually say it and then go into how they measure the snow level by comparing it against a giant pole.
    Continue Reading

  • A Repetitive Hate for Statistics

    September 17, 2007  |  Statistics

    When I tell people that I'm a graduate student in Statistics, there are two responses that I get more than any others. The most popular of the two usually goes something like this.

    Oh man, I hated statistics in college. The professor totally sucked and I never knew what was going on. All I remember is mean and some... curve thing? I don't know. What's standard deviation anyways?

    I threw that standard deviation bit in for effect. No one actually asks about it, and I'm pretty sure most people don't even remember ever hearing about it. It's that whole selective memory thing -- blocking out the bad and remembering only the happies.

    So anyways, every time someone tells me they absolutely hated statistics in college, I die a little inside and start bawling like a two-year-old whose lost her bottle. No, no, I'm kidding, but the first thing I think is, "Gee, thanks for letting me know that! Like I really wanted to know that you hate what I study. You know what? I think I hate you a little bit now." I'm exaggerating a tad, but it's slightly frustrating after hearing it so many times.

    But why do so many people hate statistics?
    Continue Reading

  • Published Data and Results Not Always Legit

    September 15, 2007  |  Statistics

    In a previous life, I thought anything published in an academic journal was legit, but as a stat student, the story is quite the opposite. Whenever I hear results or see data from some study, I become an instant skeptic.

    Were there really that many deaths from 1998 to 2007? Did housing prices really increase that much over the past decade? Do that many people really support that presidential candidate?
    Continue Reading

  • Second Day of New York Taxi Strikes

    September 6, 2007  |  Data Sharing

    As the second day of the New York taxi strike begins over GPS and credit card technology, I'm left wondering what taxi drivers are making such a big fuss over. First, drivers are complaining that GPS is an invasion of privacy, and second, they argue that credit card transactions will cause a decrease in profits due to credit card fees.

    Starting with the credit card transactions, I'm about 80% sure that drivers don't have any actual data to back up their claims that they're going to start making less money. Non-strikers say that the credit card capability will not only help business (by bringing in those with corporate credit cards), but also increase tips. This information comes from cabs that are already equipped with the proper gizmos.

    What are taxi drivers trying to hide? What is this invasion of privacy talk? These drivers are working for a large company. I repeat, they're working. I don't demand a private office when I'm at work, and I don't see much reason drivers should care a whole lot. If someone is slacking, taking shady routes, or just plain doing something they're not supposed to do, then they should be held accountable. Unless I'm mistaken, I don't recall a whole lot of whining when San Francisco cabs had similar equipment installed.

    So stop the fuss, and just mondernize up to the proper century, New York cab drivers. I'm sure Stamen Design and Cabspotting* would greatly appreciate it.

    *I am not associated to either.

  • 360 Variables Describing the United States

    September 5, 2007  |  Data Sources

    Order From Randomness Data Browser

    Order From Randomness has an extensive data collection featuring 360 variables describing all 50 states. The indicators are placed in 25 groups including birth rates, death rates, disease, environment, energy, nutrition, and education.

    Most of the data seems to range somewhere between 1999 and 2005, and I believe there's four variables to 2007. There's also a simple data browser featuring a distribution curve and some summary statistics. Generally, students seem to like the extensive set of variables, says one of my professors.

  • Decline of U.S. Men’s Tennis

    September 4, 2007  |  Statistics

    With more Many Eyes fun, Aron Pilhofer put in part 2 of his original post. I was pleased to see the first post get 56 comments, but I think part 2 might have gotten lost due to the high post frequency, with the U.S. Open fully on. Still worth a look though.
    Continue Reading

  • Exploring Twitter with Blocks

    September 2, 2007  |  Exploratory Data Analysis

    twitter-blocks

    On their new exploration section, Twitter blocks is available for viewing and use. The viz is in Flash and is supposed to allow you to explore your neighbors as well as your neighbors' neighbors. I think the higher up the blocks are, the more recent. It's kind of hard to say. Other than that, I'm actually not really sure what I'm looking at. I thought it might be because I'm not following that many people, but I viewed the blocks for the public timeline and still had trouble deciphering. Maybe others will have better luck.

    Update: Michal posted on the feedback they've been getting on Twitter Blocks that's certainly worth reading:

    So we get this a lot: "Beautiful! But useless!". We've heard it in response to most projects we've done over the past few years (one exception has been Oakland Crimespotting, whose stock yokel response is: "no way am I moving to Oakland!").

    This kinda surprises me. I think their other projects are pretty useful and informative.

  • Breaking Up the Face into Elements

    August 29, 2007  |  Statistics

    I'm not even going to pretend I know anything about how Statistics and vision go together. That's not to say that they don't go together, because they do. Otherwise there wouldn't be a whole center at UCLA, the Center for Image and Vision Science, a group of statisticians, computer scientists, and psychologists. Lots of modeling involved, lots of data, and lots of applications from security to medical imaging to assisting the visually impaired.

    Nathan as a BabyWith that being said, I came across Face of the Future, which was setup by a computer science group at the University of St. Andrews. They have a face transformer, averager, morpher, and detection. You can upload your own images for the transformer and averager. (The averager wasn't working when I tried it.) The transformer will do some image processing on your face, and from there you can see what you might look like as a baby, teenager, old adult, and different races. Fun stuff. I would show all the pictures from my little experiment, but they're kind of creepy.

    Nathan as a Simpsons CharacterOn a somewhat related note: have you ever wondered what you look like as a Simpsons character? Well now you can see for yourself. Burger King and The Simpsons have joined forces to provide you with the Simpsonizer. Undoubtedly, there's some image processing and statistics flowing around in that black box. My Simpsons character actually looks quite a bit like me.

  • Don’t want to share our data / OK, what’re you hiding?

    August 20, 2007  |  Data Sharing

    I don't want my credit card numbers floating around, because then I'd be screwed. That kind of data needs to be locked up tight behind a billion firewalls, a lock safe, five armed guards, and another locked safe and then one more guard plus another safe. However, there are lots of other kinds of data that should be online and publicly available or at least accessible via a phone call.
    Continue Reading

  • U.S. News & World Report College Rankings are Now Available

    August 17, 2007  |  Data Sources

    The well-known college rankings are now available for your viewing pleasure. Whether the ranking system is legit or not, I'll let you be the judge, but I think everyone should take note that UC Berkeley was again the number one ranked public national university and UCLA was ranked number three. Go Calee-forn-ee-ah! In a nutshell, here's what U.S. News ranks the universities:

    • Peer Assessment - 25%
    • Retention - 20% in national universities and liberal arts colleges and 25% in master's and baccalaureate colleges
    • Faculty Resources - 20%
    • Student Selectivity - 15%
    • Financial Resources - 10%
    • Graduate Rate Performance - 5%; only in national universities and liberal arts colleges
    • Alumni giving rate 5%

    I wonder how much bias is in peer assessment.

  • My Mission is to Collect Basic Data

    August 13, 2007  |  Data Sharing

    PedometerI began my path of higher education at Berkeley as an Electrical Engineering and Computer Science student. As a stat graduate student, it's hard to remember sitting in all of those (boring) engineering classes.

    If I learned anything though, it was from the painful computer science projects. No matter how big the project, I would start by breaking it up into lots of mini-tasks and work my way up to the final solution. I think this has helped me a lot not only in grad school, but solving problems in my life. Hence, my first attempt at continuous data collection has started at a very basic level -- my pedometer.

    Continue Reading

  • The World Needs Statisticians

    August 9, 2007  |  Statistics

    While doing research on the process of rebuilding New Orleans after Hurricane Katrina and the U.S. Army Corps of Engineers, I've run across a frequent critic close and knowledgeable watcher of the New Orleans rebuild: Robert Bea. I don't know much about him except that he seems like a very nice man. I found this on his Berkeley homepage:

    The world needs engineers who....

    • whose truth cannot be bought,
    • whose word is their bond,
    • who put character and honesty above wealth,
    • who do not hesitate to take chances,
    • who will not lose their identity in a crowd,
    • who will be as honest in small things as in great things,
    • who will make no compromise with wrong,
    • whose ambitions are not confined to their own selfish desires,
    • who will not say they do it "because everybody else does it,"
    • who are true to their friends through good report and evil report, in adversity as well as in prosperity,
    • who do not believe that shrewdness and cunning are the best qualities for winning success,
    • who are not ashamed to stand for the truth when it is unpopular, and · who have integrity and wisdom in addition to knowledge.

    Please help me to be this kind of engineer.

    Bob Bea

    This can certainly be applied to statisticians as well. Please help me be that kind of statistician.

    UPDATE: Just did some back and forth email with Professor Bea. He IS a nice man.

  • Proving the Non-experts Wrong

    August 7, 2007  |  Statistics

    UPDATE: I found the essay! Programmers Need To Learn Statistics Or I Will Kill Them All by Mr. Zed Shaw

    There was this online essay that I read by a guy in the computer science/electrical engineering field who totally loves statistics. He read text books, and truly spoke like someone who respects data. I thought I bookmarked it, but now have no clue where the heck it is. Argh :(. If anyone knows who I'm talking about, please tell me!

    He worked with a company where everyone thought they "knew" statistics. Automated reports would give them numbers, and they'd fully trust them. That was statistics to the computer engineers. Crunch some numbers and see what the software gives me. As a result, these engineer-types really pissed off the author of the article. Continue Reading

  • TED Talk: What do we really know about the spread of AIDS?

    July 13, 2007  |  Statistics

    In her TED talk, Emily Oster challenges our conception of AIDS and suggests other covariates that we need to look at (e.g. export volumes of coffee). Until we get out of the mindset that poverty and health care are the only causes/predictors of AIDS, we won't be able to find the best way to fight the disease. Another great use of data.

    I do have one small itch to scratch though. Emily had a line plot that shows export volumes and another line, on the same grid, of HIV infections, both over time. It reminds me of the plots that Al Gore uses with carbon dioxide levels and temperature. Anyways, using the plot, Emily suggests a very tight relationship between export volumes and HIV infections. Isn't export volume pretty tightly knit to poverty? I don't know. She's the economist, so she would know (A LOT) better than me. I guess I just wish she talked a little bit about the new and different data she has that compels us to change our conceptions.

  • Making Public Data Public

    July 11, 2007  |  Data Sources

    As Jon Udell has mentioned, there's a ton of data online, but it's not often we can find it, often hidden in the deep, dark basement of some website. He has proposed that people book mark public datasets on del.icio.us under the tag "publicdata". I think this is a great idea. In turn, you can subscribe to the feed with the url http://del.icio.us/tag/publicdata.

    I've been doing this already for a while, but I had been just tagging with "data". So I'm going to join in on the party and start tagging with publicdata, and I hope others will too. Until sites like Many Eyes and Swivel get more wind beneath their wings, I think it's necessary.

  • Finding Weirdness in Temperature Data

    July 9, 2007  |  Mistaken Data

    wunderplot500

    After parsing Weather Underground pages to grab temperature data, it's time to look at the data. Can't download all that data and not do anything with it!

    First off, in my initial pass of my parsing script, I accidentally cut the month range short, so I didn't get any data for December from 1980 to 2005. It should be noted that these plots don't show this missing data. Um, there's no axes or labels either. Sorry, I got a little lazy, but that's not the point now anyways.
    Continue Reading

Unless otherwise noted, graphics and words by me are licensed under Creative Commons BY-NC. Contact original authors for everything else.