The Simply Statistics unconference just started a few minutes ago. Tune in live below. (Or, catch the recorded version if you're late.)
The Onion tackles data privacy:
"As a law-abiding resident of this nation, I have the right to do whatever I want without a shadowy organization recording my every move, unless of course it's part of an electronic campaign designed to figure out, based on all of my emails and phone conversations, what types of clothes, shoes, and houseware products I like. Then it’s fine." Sources later confirmed that Landler had posted a Facebook rant on the issue, which had generated a pop-up ad from a company that restores lost PC data.
With all the stuff going on with surveillance and data privacy — especially the past week — it's worthwhile to revisit this essay by Daniel J. Solove, a professor of law at George Washington University, on why privacy matters even if you "have nothing to hide."
"My life's an open book," people might say. "I've got nothing to hide." But now the government has large dossiers of everyone's activities, interests, reading habits, finances, and health. What if the government leaks the information to the public? What if the government mistakenly determines that based on your pattern of activities, you're likely to engage in a criminal act? What if it denies you the right to fly? What if the government thinks your financial transactions look odd—even if you've done nothing wrong—and freezes your accounts? What if the government doesn't protect your information with adequate security, and an identity thief obtains it and uses it to defraud you? Even if you have nothing to hide, the government can cause you a lot of harm.
"But the government doesn't want to hurt me," some might argue. In many cases, that's true, but the government can also harm people inadvertently, due to errors or carelessness.
You might not have anything to hide right now, but maybe a random string of choices that was completely harmless looks a lot like something else a few years from now, to someone sniffing around the archives. The patterns when there are no patterns sort of thing. Personal data without the person. [via @hmason]
Using DNA as a storage device, Harvard researchers managed to store one million gigabits of data per cubic millimeter.
Biology's databank, DNA has long tantalized researchers with its potential as a storage medium: fantastically dense, stable, energy efficient and proven to work over a timespan of some 3.5 billion years. While not the first project to demonstrate the potential of DNA storage, Church’s team married next-generation sequencing technology with a novel strategy to encode 1,000 times the largest amount of data previously stored in DNA.
So does this qualify as big data or super tiny data?
Thousands of people have attended Edward Tufte's one-day course on data graphics. Robert Kosara did not like it.
My advice? Buy his books. Read them. They're good. Just realize that you're getting a historical perspective on data visualization, not the cutting edge. Understand that Tufte's ideas are a good starting point, not a religion. There are many things that Tufte doesn't know, including pretty much anything related to visual perception and cognition, recent work (less than 30 years old), and interaction.
I've never been, but that's sort of what I expected. Has anyone had a different experience with the course?
Update: Lots of good stuff in the comments. The consensus seems to be good/great for beginners, and others should stick to the books for refreshers.
Last week I attended the 29th annual symposium at the Human-Computer Interaction Lab at the University of Maryland. The HCIL is famous for a little thing known as the treemap, created by the founder of the lab, Ben Shneiderman. It's famous for lots of other visualizations and people too, but it's best known for the treemap.
The annual symposium is put on by the lab to showcase it's latest and greatest research. I sometimes forget that HCIL focuses on things other than visualization, so I had to sit, confused, through a few talks before I realized they weren't about visualization ("Where's the viz?" I was thinking). I won't fault them for not being all about dataviz; the Social Network Analysis Strategies for Surviving the Zombie Apocalypse by lab Director, Jen Golbeck, was thoroughly entertaining and insightful work regarding social networks.
The series premiere of United Stats of America (See what they did there?) on History is tonight at 10/9c.
Episodes explore the stats that help us understand how much money we make (and what we spend it on), how long we will live (and how we will die), what we do with our free time (and how to make more of it) and a whole lot more. In one episode, the Sklars explain how the deadliest animal in America is neither the snake nor the shark but rather the deer. In another, viewers learn that Americans waste 4.2 billion hours a year stuck in traffic and that, in a nation with over 3.5 million square miles of territory, 99 percent of us are crowded into only 8 percent of the land.
I watched a couple of clips and got bored quickly as they went through a bunch of numbers. It seems like a rehash of Yahoo and Huffington Post lists with jokes. I'm setting my expectations low, but maybe there'll be more to it in the full episodes.
What used to be a small specialty in a few newsrooms has grown some larger wings in the past couple of years. The challenge though is that a lot of journalists aren't used to handling, let alone analyzing, a lot of data. The free and open source Data Journalism Handbook, a set of guides and case studies, hopes to help with that.
It was born at a 48 hour workshop at MozFest 2011 in London. It subsequently spilled over into an international, collaborative effort involving dozens of data journalism's leading advocates and best practitioners - including from the Australian Broadcasting Corporation, the BBC, the Chicago Tribune, Deutsche Welle, the Guardian, the Financial Times, Helsingin Sanomat, La Nacion, the New York Times, ProPublica, the Washington Post, the Texas Tribune, Verdens Gang, Wales Online, Zeit Online and many others.
At a glance, looks like a promising resource, even if you're not a journalist.
We intentionally and unintentionally put data in places like Facebook and Google but most of us don't think much of it. In an interview with The Guardian, Tim Berners-Lee, inventor of the Web, says why you should care.
My computer has a great understanding of my state of fitness, of the things I'm eating, of the places I'm at. My phone understands from being in my pocket how much exercise I've been getting and how many stairs I've been walking up and so on."
Exploiting such data could provide hugely useful services to individuals, he said, but only if their computers had access to personal data held about them by web companies. "One of the issues of social networking silos is that they have the data and I don't … There are no programmes that I can run on my computer which allow me to use all the data in each of the social networking systems that I use plus all the data in my calendar plus in my running map site, plus the data in my little fitness gadget and so on to really provide an excellent support to me.
Of course, getting users to see that is easier said than done. And until users see, the incentive for companies to provide such a service is low. In turn, it's hard for data people to make a case to users, and you end up with a lot of hand waving. Challenge accepted?
Shan Carter, who makes interactive graphics for The New York Times, talks telling stories with data in his aptly named presentation, "How I tried for years to find the perfect form for interactive graphics, how I failed, and why, whether a perfect form exists or not, I've stopped my desperate pursuit."
He starts with finding a balance between statistical analysis and story, and then finishes with the kicker that visualization is a form of communication just like a movie or a book. And that carries with it its own implications.
The short Q&A at the end is pretty good, too. Just ignore the first obligatory question on how you make graphics that get more traffic.
Cathy O'Neil on when there's enough data to justify a data scientist in the workplace:
Too much to fit on an Excel spreadsheet. And it’s not just how much, it’s really about how high quality the data is; the best is for it to be clean and for it to not be public, or at least not generally used for the purpose that your business uses it for.
Even data that does fit in Excel can be examined more closely. Then again, if you only have that much data, your data scientist will get bored quickly.
Note from Nathan: Last week, visualization researchers from all over gathered in Providence, Rhode Island for VisWeek 2011. One of the workshops, Telling Stories with Data, focused on data as narrative and what that means for visualization. This is a guest post by the organizers: Nick Diakopoulos, Joan DiMicco, Jessica Hullman, Karrie Karahalios, and Adam Perer.
"Data storytelling" is all the rage on websites ranging from international news outlets, to political and economic organizations, to personal blogs. Indeed, this trend has captured the attention of those who research and work in information visualization. Scores of both aspiring and seasoned visual storytellers descended on the Telling Stories with Data workshop that we organized this year (the 2nd installment of the workshop) to discuss and learn about visualization storytelling tools, issues, and contexts. The workshop took place in Providence, Rhode Island on October 23rd and was part of the yearly international VisWeek conference which itself drew about 1,000 attendees.
As in many technological fields, those interested in "narrative visualization" face the challenge of connecting with like-minded others across the oft un-negotiated boundary between academic research and practical applications or designs. Yet these groups have much to learn from one another. To bring visualization research in contact with visualization practice, we structured the workshop line-up of speakers to include both academicians (e.g. from Harvard, UC Berkeley, UIUC) and people from industry (e.g. New York Times, Microsoft Research, OECD, Workbook Project). The talks were organized into three blocks: (1) tools for structuring and sharing, (2) communicating with visualization, and (3) storytelling in context.
Jacob Harris, a New York Times senior software architect, rants about how people like to use word clouds to tell stories:
Of course, the biggest problem with word clouds is that they are often applied to situations where textual analysis is not appropriate. One could argue that word clouds make sense when the point is to specifically analyze word usage (though I’d still suggest alternatives), but it’s ludicrous to make sense of a complex topic like the Iraq War by looking only at the words used to describe the events. Don’t confuse signifiers with what they signify.
Harris says he dies a little inside every time he sees a word cloud presented as insight. Hopefully his computer doesn't catch a virus that permanently changes his wallpaper, screensaver, and every text document he's ever written into word clouds, or yes, he would die a little inside many times and effectively die a lot inside so much that it might show on the outside.
Dramatics aside, I have to admit it is amusing when I get emails from people who think they have found the holy trinity of analysis, ease-of-use, and aesthetics that is Wordle. It was never intended as a serious analysis tool. Word clouds were originally made popular as a way to navigate tags for bookmarks, but other than that they're more of a toy and should be treated that way.
A few months ago, a packed crowd gathered in Minneapolis for the Eyeo Festival to hear some of the best in data art, visualization, and creative code talk about what they do and how they do it. I didn't get a chance to go, but from all the chatter online during the event (and the stellar speaker lineup), I get the sense I missed something good. Luckily, some of the talks are available online.
For starters, Ben Fry and Casey Reas talk about Processing, their grad school grown programming language; Aaron Koblin presents some of the work from the Google Data Arts team; and Nicholas Felton discusses the process behind his annual reports.
Catch a few more on the Eyeo Vimeo channel or to stay updated for when new videos are uploaded.
If you're good with data and looking for a job, you're in luck. There seem to be quite a few jobs out there. Here are a handful of positions that have showed up on my radar recently.
SENSEable City Lab at MIT — "The SENSEable City Laboratory is seeking exceptional candidates to fill positions involving research on the process of data visualization. Candidates should have a sound experience in the process of visualizing data both in static as well as dynamic form."
Front-end Developer at Periscopic — "A passion for dealing with data, making sense of large amounts of disparate information, or statistical analysis would be lovely."
Stamen Developer — "You're excited by the possibility of cutting and bending data to fit it through the thin straw of the internet. You can look at a source of information and model it as resources, rows and columns, messages and queues."
News Developer Jobs — A pretty good list started by Matt Waite that others can edit. Includes openings at the Chicago Tribune, New York Times, Boston Globe, and others.
Got a job you need to fill? Feel free to post it in the forums.
Following the success of the Strata conference earlier this year here on the west coast, O'Reilly is hosting another event from September 19 to 23. This time it's in New York.
If you're planning on going, I suggest you register now and save a few hundred dollars. Tomorrow is the last day for early bird registration. Plus, FlowingData readers can use the discount code FLOW at checkout for an additional 20% off (and support FlowingData in the process).
This time around there's the two-day conference on the 22nd and 23rd just like before, but there's also Strata Jumpstart on the 19th, which is "a crash course for managers, strategists, and entrepreneurs on how to manage the data deluge that's transforming traditional business practices across the board--in finance, marketing, sales, legal, privacy/security, operations, and HR." On the 20th and 21st, there's an invite-only summit.
So what you could do is go to Jumpstart, hang out in the amazing city of New York for a couple of days, and then round out the week with some interesting data talks and meetups.
If it's anything like the west coast conference — and I'm sure it will be — it'll be worth the time. When I went in February, I thought it'd be really business-y, but it turned out being an all-around fun event.
Register here, and be sure to use FLOW to get the extra 20% discount.
Competitive intelligence is poised to offer data scientists increasing job opportunities in coming years. SCIP reports that the market for business intelligence is worth approximately $2 billion annually, and Garrison says that many corporations now operate their own competitive intelligence divisions.
Plus there's a shortage of an estimated 140,000 to 190,000 people who are qualified for the openings available (not all in business). What you need to know to get hired:
As part of a relatively new field, data scientists may come from many different backgrounds. Garrison says that employers are often looking for two things when considering a job applicant. "The first part is the technical background," he says. Companies may want professionals with an industry background who are familiar with its specific jargon and trends. "If you want to work for a pharmaceutical company, you might need a degree in biochemistry," he explains. Other jobs may require only a general degree in business.
In other words, you need to know statistics and know or be able to learn about the subject matter. Programming skills are a plus. Actually, programming is required. I don't know any data scientists who don't have that skill. I hear there's some book to help you get started though.
Pete Warden, for O'Reilly Radar, compares current data responsibilities with those of harbor masters from the Victorian era. Warden warns:
Specialists like us who can understand and interpret data are in a privileged position. Most people have an exaggerated respect for arguments expressed as numbers or visualizations, because they don't understand how many assumptions and simplifications go into these creations. It's our job to remember that and balance our enthusiasm about the power of our techniques with some humility about their limits.
In other words: You should learn statistics. You don't have to go out and get a PhD, but it's helpful to be able to think like a statistician, so that you know the right way to think about data.