Proving the Non-experts Wrong

UPDATE: I found the essay! Programmers Need To Learn Statistics Or I Will Kill Them All by Mr. Zed Shaw

There was this online essay that I read by a guy in the computer science/electrical engineering field who totally loves statistics. He read text books, and truly spoke like someone who respects data. I thought I bookmarked it, but now have no clue where the heck it is. Argh :(. If anyone knows who I’m talking about, please tell me!

He worked with a company where everyone thought they “knew” statistics. Automated reports would give them numbers, and they’d fully trust them. That was statistics to the computer engineers. Crunch some numbers and see what the software gives me. As a result, these engineer-types really pissed off the author of the article.

The author went on to rant about his crappy days at the office when he would post figures and then told by the know-it-all computer scientists that his numbers couldn’t be correct. As I read, and somewhat amusingly identified with the author, I thought, “With all my hopes for public data and accessibility for non-experts via visualization, how can we make sure the system doesn’t get abused?”

Same Old Story

Visualization can be really powerful, but just like Statistics, it can be easily abused or misunderstood. For example, the data could be spotty (i.e. missing data) or have some kind of weirdness to it, but someone who is careless (or just not in the know) might disregard that and plot out the data as good old, clean, and processed data. With his pretty plot, he can now go gallivanting around town claiming whatever he wants. Blah, blah, yeah, we’ve heard this story before.

Two Types of Data Misrepresenters

Ok, so there’s two types of people who misrepresent data. The first kind is just evil — those who purposely misrepresent to make themselves or someone else look good. It’s lying, and well, good riddance to that type of misrepresenter. There’s not much hope for those liars, and if you are one of those people, I and my fellow data scientists are going to find you and your crummy reports and expose you for the crudiness you’ve brought to the world and those around you.

The second type are those who just don’t know any better. These people know that the data in their Excel spreadsheet (sigh) can be put to use, so they graph it, plot it, draw it, or whatever else they can do to make those numbers make a picture. Then they present it, and bless their hearts, but they don’t quite know what they’re showing.

Back to Essay I Cannot Remember

Those engineers… I think they’re in the second group. Those engineers, I guess, are just a bit arrogant, thinking they know everything about Statistics just because their software graphs tell them so. From my experience though, as a former electrical engineering student, I honestly do believe that they mean well. Engineers generally want absolute truth, just the facts, and do their best to resist the fluff.

What I can Learn From Unknown Essay and Unknown Author

Somewhere in the education system, for those engineers, the importance of Statistics got lost in a graph, a mean, or a median. Variance became meaningless. As I build visualization tools, I can’t forget about such important properties and have to remember that not everything (in fact, most things) are not normally distributed. With that in mind, there’s still hope for that second type of misrepresenter to become a proper representer.

*I’ve also learned to bookmark things in del.icio.us immediately, because my memory stinks. Sorry unknown author man.