Synchronized Swimming in Data and the Water Metaphor

Posted to Statistics  |  Tags:  |  Nathan Yau

The flood. The avalanche. The tsunami. Drowning in data. For the past few years, a couple of times a week, there’s an article about all the data we have access to and how we’re struggling to stay afloat in the growing sea of data. Big data is getting too big they say.

The water metaphor is fine, but the fear of the data flow is irrational, so let’s run swim with the former.

First, the floods and tsunamis. This makes it sound as if the flows of data are uncontrolled, and there’s no way to contain them when they storm to our doorstep. However, you could just turn off the nozzle, or filter, cut, and query to what you want. And these days, we practically have infinite storage, so although the source of water increases, the reservoir and dam can grow with it.

Hit delete if the data gets too big, or bottle it and save it for later.

When you learn to swim, you start at the shallow end, and work your way towards the deep end. If you’re more adventurous, you snorkel or go deep sea diving. Even then you don’t swim the entire ocean right away. You explore a little bit at a time. You chip away at the database a little bit at a time, and what you learn during one dive carries over to the next. Sure, people drown in the ocean, but the good news is that when you drown in data, you still get more chances to learn and try again.

You don’t even have to go to the ocean right away. Although they’re small, ponds and lakes can be interesting too. (Think hyperlocal.) If you really must, make friends with one of those guys from Deadliest Catch and have him bring you what you want to study. Better yet, become a greenhorn with the best captain you can find and learn the subtleties of the sea.

At the end of the day, more data means more opportunity, and if you know how to doggy paddle, you’re going to be okay. Know how to do more than that? All the better. Keep learning.

Even if you can’t swim, there are a variety of alternatives — life preservers, rafts, and those inflatable floating chairs with the cup holder to put your ice cold beer. Drowning in data? Nah.


  • That’s a really great extension of, and counterpoint to, the standard analogy.

    As another point, if you start feeding water into a closed system and don’t keep it circulating, it rapidly goes stagnant. Stagnant water is unpleasant, and generally not useful. Simply collecting and storing data, and not slicing and dicing it to see how it’s changing, creates a similar situation.

  • I recently saw a talk by a bioinformatics guy working in genetic sequencing. Apparently the rate of growth in genetic data is even steeper than the rate of growth in data storage technology. Both are growing and accelerating astonishingly fast, with no sign yet of an upper limit any time soon, but genetic data is growing so much faster they’re expecting it to exceed the total amount of data storage realistically available by (if I remember right) about 2030. That’s taking into account and projecting the insane dropping in prices and increasing in capacity of data storage. There are people working on algorithms to judge what data to discard to minimise the risk of losing something that will turn out to be vital.


The Best Data Visualization Projects of 2011

I almost didn’t make a best-of list this year, but as I clicked through the year’s post, it was hard …

Real Chart Rules to Follow

There are rules—usually for specific chart types meant to be read in a specific way—that you shouldn’t break. When they are, everyone loses. This is that small handful.

Top Brewery Road Trip, Routed Algorithmically

There are a lot of great craft breweries in the United States, but there is only so much time. This is the computed best way to get to the top rated breweries and how to maximize the beer tasting experience. Every journey begins with a single sip.

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.