Visualize Music Collections With MusicBox

Posted to Visualization  |  Nathan Yau

The great thing about being a graduate student is that you get to experiment. Anita Lillie, from the MIT Media Lab, demos MusicBox, her master’s thesis project that visualizes and maps music collections based on songs’ acoustic features. As might be expected, she uses principle components analysis to arrange songs. Each dot represents a song. If two songs sound similar, they should appear close to each other. As an example, the above dots are colored by music genre. Rap songs appear on the left in red while classical appears on the right.

As an aside, Anita’s project reminds me a lot of a GGobi demo by Di Cook. She used the tuneR library in R to quantify Beatles songs and then used GGobi to do something similar to MusicBox. R and GGobi are free to use, so if you’re interested in visualizing your own music library, you might want to check them out.

[via TechCrunch]


  • Hi Nathan,

    Just a question (mostly to test my knowledge), but what do you feel its obvious that she used principal components. From my understanding, that technique simply rearranges the variance in a system. My approach would have been to decompose the variance into factors and cluster the songs on the factors.

    Like I said, interested in hearing your thoughts. I am not a PhD, just a data dork.

    Thanks for your blog!

  • It would appear, based on the legend in the bottom left, that ‘Rap’ music is furthest to the left in red, while ‘Rock’ music sits at the mid-range level in cyan.

    And this idea is fun, random data made aesthetically pleasing. I’ve go to ask though, what would you use this information for?

  • I like how you can see the amount of variation within each genre. Most of the classical music is grouped really close together and the same seems true of country music. Rock and rap on the other hand each have a wider spread which should mean there’s a lot more variation in sound in those genres.

    Too bad she didn’t mouse over that one classical song that’s off on its own in the top right. It’s got my curiosity.

  • @Brock – I didn’t really have any statistical (legitimate) reason for saying that, but PCA seems to be a favorite among computer scientists and designers who want to reduce a multidimensional dataset.

    @Zack – oops. Thanks. I’m illiterate.

  • @Zack – Oh, and as for use, well, that’s why I mentioned the part about being a graduate student. You get to play :). However, I could see this being useful for say, discovering new music. You might have a lot of music that you’ve never heard and want to find songs that are similar to those that you like. For example, Pandora uses a similar algorithm to find songs you might like. You could imagine MusicBox as a different type of interface into Pandora Radio.

  • I haven’t seen the video yet – but from the screenshot in the post, its interesting that the data seams to follow a single dimension. I wold have thought more dimensions were needed.
    Maybe she used a varimax rotation?


Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.

Where Bars Outnumber Grocery Stores

A closer look at the age old question of where there are more bars than grocery stores, and vice versa.

Most popular porn searches, by state

We’ve seen that we can learn from what people search for, through the eyes of Google suggestions: state stereotypes, national …

Life expectancy changes

The data goes back to 1960 and up to the most current estimates for 2009. Each line represents a country.