This post is a continuation of my previous article on the Montreux Jazz Festival. In this post, we will focus on the generation of a pretty nice map of the artists of the MJF linked by their musical similarity.
You can navigate through the map of artists of the festival using your mouse or using the search box on the top right.
The search box is a bit special. You can either type an artist name or a location. If you start typing results will autocomplete underneath. In case you type a location, the suggestions proposed are artists originating from this particular location. Just use your keyboard to select an artist and the map will zoom right on it.
The map has been created from scratch using an image from NASA. The zoom level is a bit limited because I host the thousands of tiles myself :). A deeper level of zoom expands into gigabytes of data.
What does it represent?
Each node (circle) on the map represents an artist who performed in the festival. The nodes are drawn onto the map according to the artist locations. Note that a lot of artists are not present because I couldn’t find the info. If you hover over a node, the map displays a popup with some information about the artist. More importantly, it shows links to other artists of the festival. In fact, this visualization is only about displaying the network of artists of the festival linked by their musical similarity. The darker the node, the more connections it has.
How do you compute the musical similarity between artists?
Artists are actually linked together by comparing the songs they played in the festival. To compare songs together, we first need to extract some audio features. A feature is a compressed representation of a property of the underlying signal. For example, the BPM (beat-per-minute), the key and mode of a song can be considered as audio features. In our case, we extract a lot of properties from the songs such as the BPM, pitch, key, mode, energy, timbre and so forth. To get more info on audio features you can look at a lecture on the subject.
Those features take quite a lot of time to compute! Fortunately, we only need to compute them once for each song. To retrieve them in a effective manner when we want, they are stored in a database. In this particular setup I use RethinkDB which is a pretty neat open-source NoSQL database. Check it out if you are interested.
Once we retrieve the features for each song, we compare them together. In case of a small dataset, you can do a brute-force comparison of all the songs to all the others leading to (N x N-1) /2 comparisons. Imagine that you have a medium-sized dataset like the MJF with N = 30,000, it would lead to 449,985,000 comparisons! Far too costly for us.
Instead, we compute the approximate nearest neighbors of each song using random projections. Basically, it will very quickly return for each song a fixed number of neighbors that have (approximately) similar audio features. In other words, it gives me songs that sound the same but fast. The actual implementation uses Spotify’s Annoy library.
To each song is thus associated a list of similar neighbors. The perfect data structure to encode this song-to-song similarity is a graph (network) which can also be visualized.
Finally, we regroup all the songs by artist and we report the links between each song to the artists. Multiple links between two same artists are added. In technical terms, we reduce the graph using the generalized block modeling technique.
This concludes this post on the map of artists. I’ll make a more detailed blog post on how to create and visualize a graph of songs created by audio similarity soon.