Data Everywhere, Statisticians Anywhere

I had the honor to deliver the commencement speech at the UCLA Statistics graduation this past weekend. I’m going to put this here for posterity before my memory tucks it away never to be uttered again. I truncated the speech last minute, so these notes are a little more coherent than my delivery.

A big congratulations to all of you! It took a lot of work, a lot of distributions, sampling, and debugging in R to get here today, but you made it. Today’s your day.

For me, it’s weird standing up here seeing all you statisticians. When I was in undergrad, statistics was more of a required course than a field of study. I was an electrical engineering major, but around the end of my third year, I decided it wasn’t for me. I told my parents that I was going to grad school for statistics instead. It was quiet for a while. And my parents, who are all about finding what you love and going for it, asked, “Are you sure about this… statistics thing? Are you going to be able to find a job with that?” When I told my future father-in-law, he wasn’t so thrilled about it either.

So there was a ton of confidence back then in the future of statistics, especially in my family. So was I sure about statistics? Sort of? I didn’t know what I was going to do after school. I knew I liked to poke at data though, deciphering what all those numbers meant. It kind of felt like a magic power. When everyone else ran from distributions, I was having fun.

It’s a different story these days. Statistician is the sexy job of the decade, landing itself in lists of top jobs year after year.

I started to feel the shift during my second year here.

Like I said, I came to UCLA with a vague idea of what I wanted to do with statistics. The most exciting thing to me was the applied nature of it. I could use stat with a bunch of topics. I had statistics education in mind in the beginning. Looking at how we could teach younger kids complex concepts in an engaging way. Rob Gould showed me the possibilities. Then I shifted to data visualization, looking at how we can use interactive charts and graphs to understand data better. How it played a role in the everyday. Mark Hansen showed me the possibilities. Then I found myself at The New York Times making data graphics for the news. Then collaborating with graphic designers and artists for museum exhibits. And this was all as a UCLA stat student. A few years out from the PhD, I’m so thankful that my advisers and professors set me up with strong foundations and then gave me the flexibility to find what I really wanted to do. I’m sure all of you can relate.

Now look at me. I blog for a living. The other day, my wife’s co-worker said if she saw “blogger” on someone’s Tinder profile, she would swipe left in a heartbeat, which means instant rejection for those unfamiliar. So yeah, I have that going for me.

Of course, if we’re gonna be real, we all know you lead with “statistician” in your profile. Everyone’s gonna swipe right when they see “statistics.” It gets people’s attention these days, whether it be for online dating or for a job search.

But back to blogging, the greatest career move ever. I’ve learned a lot by blogging about statistics over the years. It turns out there’s a lot to gain by dealing with trolls and know-it-alls on the internet, which is why I’ll use the rest of my time to bestow upon you the three most important things I’ve learned as a blogger. Hopefully you can use them in your future stat careers, and maybe one day, you too can be a blogger like me.

So let’s get to it.

Lesson 1.

As evidenced by my own career path, statisticians can now work anywhere there is data, and data is everywhere. It’s not just in cubicles, in big tech companies, or academics. With the growing parts of our lives that take place online we produce data like never before, and there are a lot of people, groups, and companies that want to understand these new streams.

Running a site online, I tend to get a lot of recruitment email and business inquiries from these interested people, which gives you a good idea of who’s looking for help with their data. Interest is all across the board, ranging from tech and business analytics, to journalism, publishing, to non-profits and humanitarian efforts, to retail, sports, gaming, government, academic, and all the way to art galleries and children’s museums.

The spectrum for where you can go is really wide. On one end there’s the analytical side of data where you draw quantitative insights for data-driven decisions, and then on the other end, there is the beauty and stories in data that are more qualitative. Looking at what data represents and the social implications of it all. It’s not as easily measured, but equally important. UCLA statistics has given you the skills to make yourself indispensable across the spectrum.

A weird thing though is that the job titles are rarely “statistician.” It’s always data analyst, data engineer, data scientist, data journalist, data something or other. And as new grads, you might feel a little bit of imposter syndrome start to creep in. A feeling that maybe you don’t have what it takes. But you do have it takes. If you look at the job requirements, you almost always see “statistics degree or equivalent.” I think that’s telling of where statistics is headed and what it’s grown into over the years.

Bottom line: You, with the stat degree, can be a data scientist. You can be a data engineer. And yes, you can even be a blogger. The job title doesn’t matter. You’re ready for it and have what it takes to learn anything you don’t know yet, because at the core, you’re a statistician.

Lesson 2.

People care about data. They care about statistics. And it’s not just the nerdy people anymore. It’s not just the people who have data or the ones who analyze it. Millions of people around the world are interested in probabilities, simulations, distributions, space, time, relationships and uncertainty.

Sports commentators talk about analytics during general broadcasts now. ESPN acquired the the rights to political data blog FiveThirtyEight. The New York Times has a prominent data-centric section called The Upshot. Even me, a one-man shop – a statistician working from a home office – is able to prosper. There’s so much data on so many different topics that people are eager to learn about what’s going on in the world or with themselves, from a statistical point of view.

When I started FlowingData, I just wrote to connect with other data people. I had just finished my master’s and I had to move to Buffalo because my wife was starting her medical residency there. It was my outlet and a way to catalog different types of visualization as I tried to finish my PhD remotely. So it was a few hundred people who read at best. Probably fewer. Maybe five. Now it’s on the scale of millions. Just to look at charts and graphs and read about statistics. That’s still crazy to me, even though I’ve been doing it for a while.

A few months back, I used openly available data on mortality from the Centers for Disease Control and Prevention. The data has been free to download for years, as evidenced by the challenging interface to access the data. But millions of people around the world visited FlowingData to interact with charts and graphs. Crazy. It was a similar case for a simulation I made to show the average day for Americans. I used a Monte Carlo method, and the R code was similar to what I used for my dissertation. It got the attention of millions. I won design awards. Me, a statistician.

I mention these not to brag. What I hope that you can see is the scale at which you can reach others who maybe don’t even know they like data. The audience isn’t just other statisticians or other data people anymore. It’s the general public. With so many interested in statistics, that can only mean great things for you.

For me, most importantly, my parents aren’t hesitant about my career path anymore, and my now father-in-law gave me his blessing to marry his daughter.

Lesson 3.

You can use this heightened interest to your advantage. All these new capsules of attention that used to wander elsewhere, you can use it as a way to teach statistics and improve data literacy for people at all levels of stat knowledge.

There’s really no better student than one who is eager to learn. That makes the job easy, because you can give someone the material and they soak it up like a sponge. Just ask my three-year-old who’s nerding out to superheroes every waking minute.

However, the increased interest also means a lot more people who think they know it all because they read an introductory book or article on statistics, or in my case, visualization. They troll you. They talk down to you when maybe they should be asking you for advice.

A younger, ruder version of me would just mutter profanities under my breath. Or, I’d just shake my head and tell myself those sort of people should go learn some real statistics.

That’s the easy way out though, and nobody gets anything out of that interaction. Besides, it’s not what statisticians do. We’re all about rigor and due diligence and paying close attention to the tiny details. Question every single digit.

Where others barely graze the surface, you know how to examine data in depth.

You know that correlation doesn’t equal causation, but you also know how to find out when it does. You know how to find trends and patterns, but you also know when you’re just looking at statistical noise. You know how to lie with statistics, but most importantly, you know how to tell the truth with statistics.

So use these moments of angst that you’ll inevitably come across as a chance to educate others. Especially these days, finding the truth with data is more important than ever. Embrace the responsibility, and we’ll all come out better in the end.

At the same time, it goes the other way. Never stop learning. Accept that you don’t know everything and seize opportunities to collaborate with others across the data spectrum. Sometimes statisticians get stuck in a stat bubble, where analysis and theory rules over everything else data-related. Obviously, that’s our strong suit, so that makes sense, but others have their own strengths, whether it be technical know-how, research in visual perception, experience with data presentation, or telling stories with data. The best work hands-down comes from those with a multi-disciplinary mindset. So keep learning. Data is a lot more fun that way.

Besides, you don’t want to get stuck in that ivory tower. It gets lonely up there. Super cold, dank. Sometimes the toilet breaks unexpectedly and you have to walk all the way down the stairs because there’s no elevator. It’s just not a good place to be.

So there you have it. Three lessons.

Lesson 1. Data is everywhere, and therefore, you can go anywhere and make a difference.

Lesson 2. People care about data, and it’s on a much bigger scale now, which makes statistics that much more exciting.

Lesson 3. With so much attention, it’s your responsibility to teach others about data, which also means we must never stop learning.

And now, all of you can be bloggers too in addition to your future stat careers. Woo.

Whatever you do at the end of the day, when presented with so many possibilities, it’s my hope that you choose to do good with your magic powers. I have little doubt though because you graduated from UCLA statistics.