The great thing about being a graduate student is that you get to experiment. Anita Lillie, from the MIT Media Lab, demos MusicBox, her master’s thesis project that visualizes and maps music collections based on songs’ acoustic features. As might be expected, she uses principle components analysis to arrange songs. Each dot represents a song. If two songs sound similar, they should appear close to each other. As an example, the above dots are colored by music genre. Rap songs appear on the left in red while classical appears on the right.
As an aside, Anita’s project reminds me a lot of a GGobi demo by Di Cook. She used the tuneR library in R to quantify Beatles songs and then used GGobi to do something similar to MusicBox. R and GGobi are free to use, so if you’re interested in visualizing your own music library, you might want to check them out.
Just a question (mostly to test my knowledge), but what do you feel its obvious that she used principal components. From my understanding, that technique simply rearranges the variance in a system. My approach would have been to decompose the variance into factors and cluster the songs on the factors.
Like I said, interested in hearing your thoughts. I am not a PhD, just a data dork.
Thanks for your blog!
It would appear, based on the legend in the bottom left, that ‘Rap’ music is furthest to the left in red, while ‘Rock’ music sits at the mid-range level in cyan.
And this idea is fun, random data made aesthetically pleasing. I’ve go to ask though, what would you use this information for?
I like how you can see the amount of variation within each genre. Most of the classical music is grouped really close together and the same seems true of country music. Rock and rap on the other hand each have a wider spread which should mean there’s a lot more variation in sound in those genres.
Too bad she didn’t mouse over that one classical song that’s off on its own in the top right. It’s got my curiosity.
@Brock – I didn’t really have any statistical (legitimate) reason for saying that, but PCA seems to be a favorite among computer scientists and designers who want to reduce a multidimensional dataset.
@Zack – oops. Thanks. I’m illiterate.
@Zack – Oh, and as for use, well, that’s why I mentioned the part about being a graduate student. You get to play :). However, I could see this being useful for say, discovering new music. You might have a lot of music that you’ve never heard and want to find songs that are similar to those that you like. For example, Pandora uses a similar algorithm to find songs you might like. You could imagine MusicBox as a different type of interface into Pandora Radio.
I haven’t seen the video yet – but from the screenshot in the post, its interesting that the data seams to follow a single dimension. I wold have thought more dimensions were needed.
Maybe she used a varimax rotation?