I began my path of higher education at Berkeley as an Electrical Engineering and Computer Science student. As a stat graduate student, it's hard to remember sitting in all of those (boring) engineering classes.
If I learned anything though, it was from the painful computer science projects. No matter how big the project, I would start by breaking it up into lots of mini-tasks and work my way up to the final solution. I think this has helped me a lot not only in grad school, but solving problems in my life. Hence, my first attempt at continuous data collection has started at a very basic level -- my pedometer.
My Big Problem: Advanced Data Sharing
For the past year, I've been a part of the Center for Embedded Networked Sensing. It's a very cool, very large research group centered at UCLA with groups at UCR, USC, UCM, and Cal Tech. There's also a super-techy test bed up in the San Jacinto Mountains.
In a nutshell, we have tons of small sensors that collect environmental data that go straight to our database application, SensorBase. The data is coming in about every five minutes. Then from another angle -- the urban angle -- we have networks of mobile phones that collect data in the form of audio, images, and GPS coordinates.
As you can imagine, there's a lot to think about, starting at the hardware and working on up to the data. I focus on the data (naturally). The database is populated with millions of rows of data and is continuously growing. How can we visualize it? How can we be sure it's reliable? Is the data secure? Can we search through the multi-formed data?
My Mini-task: Basic Data Collection
OK. "BREATHE," I tell myself. Big problem. Break. Into. Small. Pieces.
So taking a step back from the fancy sensors and Python-enhanced Nokia N80's, I've turned to my super-simple pedometer. It only collects the number of steps I take and it only has one button -- the reset button. I've been wearing the thing on and off for about a month and to be honest, I'm surprised about how much I've learned about data collection and sharing.
Collecting My Daily Steps
Here was my simple plan. I would clip my $5 pedometer, which is amazingly accurate, to my waist every morning. That's it. It seems easy enough, but there would always be days I forgot to wear it even though I put it right next to my wallet. I forgot to wear it a lot especially at the beginning of my task.
Another problem was that I usually didn't put on the pedometer until I left the house. Some days, I didn't get a move along until mid-afternoon or even the evening, so I missed all the steps I took from the living room to the bathroom and then back to the living room and then to the study.
Logging My Daily Steps
At the end of each day, I would log the number of steps in Excel. Easy enough. However sometimes I didn't have access to my computer, so I would write down on a piece of paper. Man, I can't even count how many times that piece of paper got thrown away. Then there would be days, I would forget to log the number of steps.
To combat the first problem, I decided to use Google spreadsheets to log my steps. That way, I could use any computer, but then some days I wouldn't have internet, and the numbers would get lost like before.
Where to Go From Here
Data sharing is great, but when the data is crummy and inaccurate, there's really no point. Good data sharing starts with good data collection.
In my pedometer experience it seems the main problem is consistency. I kept forgetting, so my data is spotty and well, it sucks. That makes me not want to share my data, which is bad. It could also be bad if I did share my data. It could get mistaken for clean data somewhere along the telephone line, which is a problem with a lot of the data that's out there.
We have to improve our data collection and logging somehow, but how? Is it possible for any citizen scientist, Joe Schmoe, collect good, wholesome data? I'm not so sure, at least not yet.