How Netflix creates movie micro-genres

Posted to Statistics  |  Tags: , ,  |  Nathan Yau

Alexis Madrigal and Ian Bogost for The Atlantic reverse engineered the Netflix genre generator, analyzed the data, and then made their own. Then they talked to Todd Yellin, the guy at Netflix who created the micro-genre system. It's no accident when you see altgenres like "Visually-striking Goofy Action & Adventure" and "Sentimental set in Europe Dramas from the 1970s" in your browser.

The Netflix Quantum Theory doc spelled out ways of tagging movie endings, the "social acceptability" of lead characters, and dozens of other facets of a movie. Many values are "scalar," that is to say, they go from 1 to 5. So, every movie gets a romance rating, not just the ones labeled "romantic" in the personalized genres. Every movie's ending is rated from happy to sad, passing through ambiguous. Every plot is tagged. Lead characters' jobs are tagged. Movie locations are tagged. Everything. Everyone.

That's the data at the base of the pyramid. It is the basis for creating all the altgenres that I scraped. Netflix's engineers took the microtags and created a syntax for the genres, much of which we were able to reproduce in our generator.

Be sure to play around with Bogost's generator at the top. It will amuse.

Favorites

Causes of Death

There are many ways to die. Cancer. Infection. Mental. External. This is how different groups of people died over the past 10 years, visualized by age.

Years You Have Left to Live, Probably

The individual data points of life are much less predictable than the average. Here’s a simulation that shows you how much time is left on the clock.

Where People Run in Major Cities

There are many exercise apps that allow you to keep track of your running, riding, and other activities. Record speed, …

The Best Data Visualization Projects of 2011

I almost didn’t make a best-of list this year, but as I clicked through the year’s post, it was hard …