• John Hopkins BiostatThis just might be nerdy statistics overload even for me. A group from the John Hopkins biostatistics department has created parodies of Sir Mix-A-Lot’s Baby Got Back and MC Hammer’s Too Legit To Quit. For your listening pleasure – Baby Got Stats and Too Logit.

    The songs are in MP3 format, so you can put them on your iPod and play them over and over and over again. One play-through was enough for me, but clearly, it’s only a matter of time before this biostat group hits main stream.

    [via Freakonomics]

    Update: Here’s the video version for your viewing pleasure.

  • Wondering what statistics is for? This is what.

    Data are a whole lot of meaningful patterns. We can generate data indefinitely, we can exchange data forever… we can store data, retrieve data and file them away. All this is great fun and maybe useful, maybe lucrative, but we have to ask why. The purpose is regulation and that means translating data into information. Information is what changes us. My purpose is to effect change – to impart information.

    Platform for Change by Stafford Beer

  • A quick reminder – there’s just three more days to put in your contest entry to win Edward Tufte’s Visual Display of Quantitative Information. Leave a comment on any FlowingData post after March 19 to this Sunday March 30. I’ll announce the winner on March 31. Good luck!

    Here’s the original contest announcement for those who missed it.

  • If I’ve learned anything about designing information graphics, it’s that attention to detail and small changes make a mediocre graphic into a really useful and usually more attractive one. It’s what sets New York Times graphics apart from those in other publications and especially those in academic papers. Something like a short annotation can add context or a line shifted slightly to the left can make data look less cluttered.
    Read More

  • While we’re on the topic of what you plan to do with your PhD in statistics – UCLA department of statistics recently announced that it is looking for a new professor.

    Applications and nominations are invited for the position of Professor of Statistics, any level (tenure-track Assistant Professor, tenured Associate Professor or tenured Full Professor), in the Department of Statistics at the University of California, Los Angeles.

    The position targets candidates with high quality research, a strong teaching record, and with expertise preferably in one or more of the following areas: Environmental Statistics, Social Statistics, and Spatial Statistics. Qualified candidates must have a Ph.D. in Statistics or Biostatistics. The position is effective July 1, 2009.

    UCLA department of statistics is one of the best stat programs in the country with a talented faculty and really cool students. Albeit, I might be a little biased, but still. If you’re interested, go for it; or if you know anyone who might be qualified, do them a solid and forward them the information.

  • Statistics graduate students at Columbia University are hosting a symposium on careers for PhDs in statistics.

    Current confirmed speakers include industry statisticians at Google, AT&T Labs-Research, National Institutes of Health, and Pfizer, Inc and academic statisticians from statistics, marketing, and biostatistics departments at Columbia University, University of Pennsylvania and Rutgers University.

    The Symposium will be held at Columbia University in New York on April 4, 2008 from 1-5pm. A wine and hors d’oeuvre reception will follow so that there will be ample time to chat informally with our guests, and a student mixer after that is also in the works.

    The conference is free and they’re offering a $40 travel reimbursement for students who would like to attend. Consider going if you’re in the area. It should be interesting. Here’s the online registration.

    If anyone actually does end up going, let me know. I’d love for you to share your experience here. For the current and future stat PhDs or masters students, what are you doing or planning to do with your degree? Other than framing it, I’m still searching for my answer.

    [via Statistical Modeling]

  • Just when you thought it was safe to upload those photos from that wild Friday night to Facebook, this happens:

    A security lapse made it possible for unwelcome strangers to peruse personal photos posted on Facebook Inc.’s popular online hangout, circumventing a recent upgrade to the Web site’s privacy controls.

    The dumbest part is how easy it has been all this time to find private photos. All it took was a modified URL with a photo ID to “hack” into Paris Hilton, Mark Zuckerberg, or anyone else’s private albums. I don’t know the whole story, but given Facebook’s excellent reputation, you’d think that they would know better. The security hole has been plugged for now, and I am sure the Facebook group is working hard to make sure there are no other leaky areas.

    This leak probably couldn’t have been more poorly timed for Facebook with the release of their new security measures as well as MySpace’s not so distant and a bit too familiar photo breach.

    This really makes you wonder – what’s next?

    Photo by Meredith Farmer

    [via ReadWriteWeb]

  • Dash, an Internet-connected GPS device, is going to change the way you drive by making use of traffic data. Where does the data come from? Well, that’s the best part.
    Read More

  • Billions of watts are wasted every year including 1955, 1985, and 2015. Be kind to the environment and keep your speed under 88 miles per hour. The space-time continuum appreciates it.

    Roads? Where we’re going, we don’t need roads.

  • Google recently released a visualization API that allows you to share embeddable visualization on your website, create Google Gadgets that can be shared and reused, and create extensions for existing Google products. Andrew asks, “Will this shape the future of data visualization online?”

    On one side, this is exciting for the visualization field, because when Google talks, everyone listens. On the opposing side, could this be another Google Maps type of thing? Google Maps was cool at first, but now, mashup after mashup has left me bored and disillusioned. Ultimately though, I like to think that this API is going to benefit all of us.

    What the API Offers

    There’s a slew of charts, graphs, gidgets, and gadgets available that you’ll see in the gallery.

    Time Series

    I’m sure this Google Finance-looking graph will make a lot of people happy.

    Time Series

    Gauges

    These are, um, interesting.

    Guages

    Maps

    We’ve seen this before, but the difference here is that it’s now in widget form, which means a hook into Google Docs and other apps.

    Maps

    How We Will Benefit

    If Google visualization becomes popular, visualization, in general, grows in popularity. People who weren’t exposed will now know more, and if all goes according to plan, data awareness has a chance to develop.

    As an example, Google Maps made online mapping what it is now – commonplace. Remember when online mapping was only limited to the big boys? Now everyone can mashup to their heart’s content. People know how to use it and similar mapping applications and because of that, more “idea people” ask for mapping. As a result there is more opportunity.

    Similarly, with the data viz API, we’ll see data mashups outside of the map. Data visualization will no longer just be for the big boys, but at the same time, we’ll still be able to make our own designs with a wider audience ready to experiment and play.

    Good or Bad?

    What do you think? Is the Google visualization API going to limit our imagination where we get stuck in a Google-ish funk; or is data and visualization awareness ready to rise to a point where we all benefit?

  • Speaking of contests, the Applied Statistics Center at Columbia University is holding their first annual art contest.
    Read More

  • FlowingData reached a long awaited milestone yesterday – 1,000+ subscribers. Thank you to everyone who has subscribed, commented on, and linked to FlowingData. To extend my thanks, I’m running a (very easy) contest to win Edward Tufte’s milestone book – The Visual Display of Quantitative Information.

    Reaching 1,000 Subscribers

    I started FlowingData ten months ago not really knowing what to do with it. My new hosting plan came with a free domain, so I thought, hey, FlowingData. One month in, I started this blog to convince people that statistics was more than their least favorite class in college.

    Since then, it has been my goal to reach 1,000 subscribers. It has been my goal to find 1,000 people interested in data or get them interested in it. This happened yesterday. FlowingData now has 1,025 subscribers – thanks to all of you!

    How to Win Visual Display

    So now I’m going to make it really easy for you to win a copy of Edward Tufte’s Visual Display. All you have to do is leave a comment on any new post (this one included) during the next 10 days. On March 31, I’ll randomly select a winner. The more comments you leave, the higher the chances are of winning.

    The comment should add to the conversation, and trackbacks and pingbacks don’t count. That’s it. If you have a valid mailing address that Amazon can deliver to, you can win. Oh, and make sure you leave a valid email address so that I can contact you when you win. Good luck and again, thank you, everyone.

    Next Step: 5,000 Subscribers

    Let’s get more people talking about data and visualization and find those who already are. I know that there are hundreds of thousands of people we haven’t reached yet. If you could take a few seconds to email one friend about FlowingData, I will super appreciate it.

  • Email has grown to be a huge part of our lives and is very much commonplace. We can connect with others in just a few clicks. With all the email sent per day, how can we understand these connections? How can we visualize the type of email we’ve been sending? Can we tell a story somehow with the thousands of emails we’ve sent, received, and deleted?

    These 21 email visualizations investigate. I’ve split them up into six categories – exploratory, analytic, mapping, metaphor, networks, and abstract.
    Read More

  • I stumbled across this dataset covering piracy of Oscar-nominated films over the last 6 years and a short analysis.

    Piracy by the NumbersDespite the Academy’s efforts to crack down on bootlegging, its attempts haven’t done a whole lot. Focus on stopping one area, like downloading, another area just grows more prolific, like Region 5 DVDs from overseas. A quick search in the right places will show you that piracy isn’t going away any time soon.

    I even met someone whose job it was to find people who were “seeding” films through bit torrents and to report them to police. I got the impression that it was a really tedious process and people go uncaught most of the time. I’m uh, not condoning this, but if you don’t want to get caught, just make sure you stop the torrent once you’ve got your file.

    Bootlegging on Seinfeld

    Bootlegging always reminds me of the Seinfeld episode when Jerry somehow gets caught up in a bootlegging scheme:

    [T]here was a kid couldn’t have been more than ten years old. He was asking a street vendor if he had any other bootlegs as good as Death Blow. That’s who I care about. The little kid who needs bootlegs, because his parent or guardian won’t let him see the excessive violence and strong sexual content you and I take for granted.

    For those interested (and I know you are), the term bootleg originates from hiding flasks of liquor in the legging of boots. Ahoy, matey.

    Photo by mumelopics

  • Two weeks ago, I vowed to stop procrastinating using two strategies:

    1. Make a to-do list every night to lay out what will get done the next day
    2. Enable the Greasemonkey script – Invisibility Cloak – which will block all the sites that I waste too much time on except during lunch and on the weekend

    Down You, ProcrastinationSince I enabled the plugins and started to-do lists, my browsing time has gone down a whopping 3.5% – from 10.11 hours per day to 9.76 hours per day. Ok, it doesn’t sound like much, but there’s a bit more to the story.

    Growing More Productive

    Even though the time decrease isn’t much, I’ve still been more productive than when I wasn’t trying to improve. Since all of my favorite sites – Facebook, Google Reader, this blog – are blocked during the day, I spend more time reading papers and researching stuff I’m supposed to be looking for.

    Planning to Improve More

    Productivity has gone up, but there’s still room for improvement. There have been days when I did not feel like working, so I cheated, and turned off the plugins and scratched the to-do list. As a result, I wasted a lot of time.

    On the days I feel blah, I’m going to avoid turning off the plugins and see where that takes me. I will also work on creating more specific to-do lists the night before, because when I put in vague tasks like “go over papers” it didn’t really get done. However, if I put in, “read paper X, paper Y, and summarize each” then it usually got done.

    Failed Tactic

    I also tried hiding the dashboard (I have a Mac) so that I couldn’t see that I had new emails, but that just (as embarrassed as I am to admit) let me wondering more. I would keep checking which seemed to waste more time.

    I’ll put in my final report in two weeks.

    How’s everyone else doing?

  • Jose Luis Vicente and Irma Vilà, in collaboration with Bestiario, have created an interactive installation in Flash that allows you to explore the radio spectrum – the electromagnetic space covering signals from radio and television to GPS, bluetooth, and mobile phones. The piece represents a database of projects and services (in the the radio spectrum) developed over the past decade.
    Read More

  • Hi, Boing Boing readers. Welcome to FlowingData. For the new visitors, here’s the rundown (and for the old visitors, welcome back). My name is Nathan, and I’m a statistics graduate student / computer science graduate obsessed with data and visualization. Here on FlowingData I cover how statisticians, computer scientists, designers, and other experts use data to help us better understand ourselves and our surroundings.

    For more details, check out the about page and feel free to contact me if you have any questions. If you like what you see, you might want to subscribe to the feed.

    Again, thanks to David and Boing Boing for linking here, and again, thanks to Mike for making the suggestion!

  • In light of the MySpace photo breach (due to their negligence) a couple…

  • I just created a new Twitter account, and it got me to thinking about all the data visualization I’ve seen for Twitter tweets. I felt like I’d seen a lot, and it turns out there are quite a few. Here they are grouped into four categories – network diagrams, maps, analytics, and abstract.

    Network Diagrams

    Twitter is a social network with friends (and strangers) linking up with each other and sharing tweets aplenty. These network diagrams attempt to show the relationships that exist among users.

    Twitter Browser

    Twitter Browser

    Twitter Social Network Analysis

    The ebiquity group did some cluster analysis and managed to group tweets by topic.

    Twitter Social Network Analysis

    Twitter Vrienden

    Twitter Vrienden

    Twitter in Red

    I’m not completely sure how to read this one. I looks like it starts from a single user and then shoots out into the network.

    Twitter in Red

    Twitter Network

    Twitter Network

    Read More

  • Wired Magazine recently did a feature on data-driven art.

    The above image is Jason Salavon’s work that shows U.S. population by county. The technically-minded readers might be thinking, “I don’t get it. What am I seeing here? I don’t even know what county has the greatest population.” I understand where you’re coming from, but hey, it’s art not a status update.
    Read More