Visualize Music Collections With MusicBox

Dec 18, 2008

The great thing about being a graduate student is that you get to experiment. Anita Lillie, from the MIT Media Lab, demos MusicBox, her master’s thesis project that visualizes and maps music collections based on songs’ acoustic features. As might be expected, she uses principle components analysis to arrange songs. Each dot represents a song. If two songs sound similar, they should appear close to each other. As an example, the above dots are colored by music genre. Rap songs appear on the left in red while classical appears on the right.

As an aside, Anita’s project reminds me a lot of a GGobi demo by Di Cook. She used the tuneR library in R to quantify Beatles songs and then used GGobi to do something similar to MusicBox. R and GGobi are free to use, so if you’re interested in visualizing your own music library, you might want to check them out.

[via TechCrunch]


  • Hi Nathan,

    Just a question (mostly to test my knowledge), but what do you feel its obvious that she used principal components. From my understanding, that technique simply rearranges the variance in a system. My approach would have been to decompose the variance into factors and cluster the songs on the factors.

    Like I said, interested in hearing your thoughts. I am not a PhD, just a data dork.

    Thanks for your blog!

  • It would appear, based on the legend in the bottom left, that ‘Rap’ music is furthest to the left in red, while ‘Rock’ music sits at the mid-range level in cyan.

    And this idea is fun, random data made aesthetically pleasing. I’ve go to ask though, what would you use this information for?

  • I like how you can see the amount of variation within each genre. Most of the classical music is grouped really close together and the same seems true of country music. Rock and rap on the other hand each have a wider spread which should mean there’s a lot more variation in sound in those genres.

    Too bad she didn’t mouse over that one classical song that’s off on its own in the top right. It’s got my curiosity.

  • @Brock – I didn’t really have any statistical (legitimate) reason for saying that, but PCA seems to be a favorite among computer scientists and designers who want to reduce a multidimensional dataset.

    @Zack – oops. Thanks. I’m illiterate.

  • @Zack – Oh, and as for use, well, that’s why I mentioned the part about being a graduate student. You get to play :). However, I could see this being useful for say, discovering new music. You might have a lot of music that you’ve never heard and want to find songs that are similar to those that you like. For example, Pandora uses a similar algorithm to find songs you might like. You could imagine MusicBox as a different type of interface into Pandora Radio.

  • I haven’t seen the video yet – but from the screenshot in the post, its interesting that the data seams to follow a single dimension. I wold have thought more dimensions were needed.
    Maybe she used a varimax rotation?


How We Spend Our Money, a Breakdown

We know spending changes when you have more money. Here’s by how much.

Reviving the Statistical Atlas of the United States with New Data

Due to budget cuts, there is no plan for an updated atlas. So I recreated the original 1870 Atlas using today’s publicly available data.

Graphical perception – learn the fundamentals first

Before you dive into the advanced stuff – like just about everything in your life – you have to learn the fundamentals before you know when you can break the rules.

One Dataset, Visualized 25 Ways

“Let the data speak” they say. But what happens when the data rambles on and on?