Retrieving Your Top Genres on Spotify

For the past several years, Spotify has been packaging and releasing data about your listening history at the end of every year - either through a “Year in Review” that features an overview of your listening history, or through a playlist featuring your most played songs. As someone who switched from iTunes to Spotify1 in 2010, I always missed iTunes’ play count feature, and was happy to see Spotify expose my most listened-to songs when they first started making Year in Review playlists. Last year, their Year in Review included information about your most listened-to genres. On its own, the list of genres was fun to see, but did not reveal any particularly enlightening insights about my music preferences; however, the accompanying genre playlists that Spotify generated were awesome to listen to, and introduced me to several songs that I love and might not have otherwise found.

The playlists are generated from a music classification algorithm created by Glenn McDonald of Every Noise At Once2. When I first found Every Noise At Once a few years ago, I did what I imagine most visitors to the website do, and immediately went to the genres that I thought I listened to. But what about genres on the map with names that I wasn’t familiar with, but that nonetheless featured prominently in the music I already listened to?

A huge problem area for tech companies that focus on media distribution is that of recommending media for a user given their previous experiences, using data from other users. Despite the proliferation of machine learning techniques - and their incorporation into the recommendation systems found at your nearest media streaming service - I and many others still find individualized, heuristic-based approaches to work far better. While I hold out for the dystopic and bittersweet day where an algorithm knows my tastes better than me or my friends do, in the meantime, I would like to know all the Spotify genres I’ve ever listened to, so I can use Every Noise At Once to find new songs to listen to and take my music discovery into my own hands.

Although Spotify does not expose your entire listening history through their API, you can use several endpoints to retrieve your liked tracks and playlists. I frequently use my liked tracks to store my music, much like an iTunes library, so I chose to write a script to retrieve all my liked tracks and their corresponding genres.

Sign in through Spotify below to see what your genres are! The generation may take a few seconds, depending on how many tracks you liked. Once the list appears, you can navigate to the Every Noise At Once link for that genre by clicking on it.

Some Caveats

  1. As I stated earlier, my script retrieves your most common genres based on your liked tracks. If you don’t store your music on Spotify by liking it, then this list will be incomplete. Thankfully, it’s pretty easy to like tracks in bulk on Spotify - you can copy a chunk of tracks and paste directly into the “Songs” section of your Spotify library. Unfortunately, Spotify does not expose a user’s entire listening history through their API, which means you can’t get your most listened to genres. Because Spotify’s Year in Review exposes your most listened to genres and my script exposes your most liked genres, there may be some discrepancies in the top genres between the two. I considered, briefly, the idea of writing a script that I could run locally that would send my current Spotify activity to a remote server, which would just store my entire history, but decided against it. The juice does not seem worth the squeeze since I “like” every song I wish to listen to more than once, and I’ve already moved from Spotify to Tidal!

  2. Spotify does not have data on genres for songs or albums, despite claiming they do in their API documentation. My genre list is generated using each song’s artist’s genres. If you have multiple songs by one artist, that artist will only be counted once. If a song contains multiple artists, the genres of those artists will be weighted equally in the final count - I do not give more weight to the genres of the “original” artist than the genres of the “featuring” artists.

  3. Each artist may be (and usually is) linked to more than one genre.

Signing in as a Spotify user

Spotify has several different types of authorization workflows, that are explained here. For this app, we use the implicit grant flow, that allows this web app to temporarily get authorization from the Spotify user and run some client-side code with the received access token. The following code contains the code from this template for an implicit grant workflow.

When the window for this webpage loads, we check to see if we have an access token available. If we do, then we retrieve a hash of the token and use that to authorize ourselves to retrieve user data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
window.onload = function() {
  token = get_token();
}

function get_token() {
  const hash = window.location.hash
  .substring(1)
  .split('&')
  .reduce(function (initial, item) {
    if (item) {
      let parts = item.split('=');
      initial[parts[0]] = decodeURIComponent(parts[1]);
    }
    return initial;
  }, {});
  window.location.hash = '';

  let _token = hash.access_token;
  return _token;
}

However, when first accessing this URL, there isn’t an access token available, because the you, the user, haven’t authorized us yet! You probably clicked on the “Login with Spotify” button above (If you haven’t, you should! You’ll get to see all your Spotify genres!). When this button is clicked, the page redirects to Spotify’s authorization page and, once Spotify completes authorization, redirects you back to this URL with an access token. Now, we can get a hash of the token using the code we discussed above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
loginButton.addEventListener('click', () => {
  const authEndpoint = 'https://accounts.spotify.com/authorize';

  const clientId = our_client_id;
  const redirectUri = 'https://sofiya.io/blog/genres';
  const scopes = [
    'user-top-read',
    'user-library-read'
  ];

  if (!token) {
    window.location = `${authEndpoint}?client_id=${clientId}&redirect_uri=${redirectUri}&scope=${scopes.join('%20')}&response_type=token&show_dialog=true`;
  }
});

Getting artists from your saved tracks

After Spotify has authorized us and we have a valid access token, we want to get a list of artists based on your saved tracks. Because the Spotify API returns paginated results, getArtistsFromSavedTracks is a recursive function that takes a URL and a set of artists as input. It performs a GET request on that URL, processes the returned json by adding all the returned artists to a set, and, if a “callback URL” is provided by Spotify, calls itself again with the new URL and the updated set of artists. This entire function returns a promise.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
function getArtistsFromSavedTracks(url, setOfArtists) {
  return fetch(url,
    { headers: {'Authorization': 'Bearer ' + token }}
  )
  .then(function(data) {
    return data.json();
  }).then(function(data_json) {
    data_json.items.forEach(track => {
      track.track.artists.forEach(artist => {
        setOfArtists.add(artist.id);
      });
    });

    if (data_json.next) {
      return getArtistsFromSavedTracks(data_json.next, setOfArtists);
    } else {
      return setOfArtists;
    }
  });
}

Getting genres from artists

Once all the artists are collected, we want to retrieve each artist’s corresponding genre(s). The getArtistGenreFromArtists function takes a “batch” of artists - a subset of the original set of artists - and returns a dictionary of each genre and how often it appears in your liked artists. This function, like the previous function, returns a promise.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
function getArtistGenreFromArtists(artistBatch) {
  return fetch("https://api.spotify.com/v1/artists/?ids=" + artistBatch,
    { headers: {'Authorization': 'Bearer ' + token }}
  ).then(function(data) {
    return data.json();
  }).then(function(data) {
    let genreDict = {};
    data.artists.forEach(artist => {
      genreDict = addToGenreDict(artist.genres, genreDict);
    })
    return genreDict;
  });
}

function addToGenreDict(genre, genreDict) {
  genre.forEach(genre => {
    genreDict['genre'] = (genreDict['genre'] || 0) + 1;
  });
  return genreDict;
}

In the code above, the line genreDict['genre'] = (genreDict['genre'] || 0) + 1; is equivalent to the following few lines:

1
2
3
4
5
if (genreDict['genre']) {
  genreDict['genre'] += 1;
} else {
  genreDict['genre'] = 1;
}

Tying it together

Right now, we have getArtistsFromSavedTracks, a function that retrieves a set of artists from your saved tracks, and getArtistGenreFromArtists, a function that constructs a dictionary mapping genres to their frequency given a batch of artists. We also have a function that allows the user to authorize our application to retrieve their data, and have incorporated it into our app with a click handler on the “Login with Spotify” button. After authorization, we want to call getArtistsFromSavedTracks and getArtistGenreFromArtists.

The code below contains the entirety of the final function, for completeness. This function is called when the “Get Genres” button is clicked. From the structure, we can ascertain that this function is a series of chained promises3. As we step through the code, we will focus on each promise in the chain in more depth.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
getGenresButton.addEventListener('click', () => {
  fetch("https://api.spotify.com/v1/me",
    { headers: {'Authorization': 'Bearer ' + token }}
  ).then(function(user) {
    let setOfArtists = new Set();
    return getArtistsFromSavedTracks(
      'https://api.spotify.com/v1/me/tracks',
      setOfArtists
    );
  }).then(function(setOfArtists) {
    let genresPromises = [...setOfArtists]
    .chunk(50)
    .map(artistBatch => getArtistGenreFromArtists(artistBatch));

    return Promise.all(genresPromises);
  }).then(function(genreDict) {
    var flattenedGenreDict = genreDict.reduce((result, currentObject) => {
      for(var key in currentObject) {
          if (currentObject.hasOwnProperty(key)) {
            result[key] = (result[key] || 0) + currentObject[key];
          }
      }
      return result;
    }, {});

    return flattenedGenreDict;
  });
});

Promise 1: Retrieving a user

First, we retrieve a user, using the extremely handy and easy to use fetch function4.

1
2
3
fetch("https://api.spotify.com/v1/me",
  { headers: {'Authorization': 'Bearer ' + token }}
)

Promise 2: Calling the getArtistsFromSavedTracks function

Then, we call our function to get artists from a user’s saved tracks. Remember that getArtistsFromSavedTracks takes a URL and a set as parameters, and recursively calls itself to construct a complete setOfArtists; our initial URL is simply the URL for retrieving the first batch of liked tracks, and our initial set is an empty set. Further, because getArtistsFromSavedTracks returns a promise that returns a set, rather than just a set, we can just return the call to that function inside our “then” and make the code in this function a little bit more readable.

1
2
3
4
5
6
7
.then(function(user) {
  let setOfArtists = new Set();
  return getArtistsFromSavedTracks(
    'https://api.spotify.com/v1/me/tracks',
    setOfArtists
  );
})

Promise 3: Calling the getArtistGenreFromArtists function

After we have a set of artists, we want to call getArtistGenreFromArtists on that set. However, because getArtistGenreFromArtists takes a subset of artists as a parameter, we want to batch our set into a list of subsets. Here is the entirety of the promise:

1
2
3
4
5
6
7
8
.then(function(setOfArtists) {
  let genresPromises = [...setOfArtists]
  .chunk(50)
  .map(artistBatch => getArtistGenreFromArtists(artistBatch));

  return Promise.all(genresPromises);
})

First, we use the spread operator to convert the set into an array5.

Once we have an array of artists, we call chunk on it - a custom property for an array that we’ve defined. This custom property, taken from StackOverflow, is applied to an array and takes a “chunk size” - how many items large each chunk should be. It returns an array of arrays of artists, where each array is no larger than 50 artists.

1
2
3
4
5
6
7
8
Object.defineProperty(Array.prototype, 'chunk', {
    value: function(chunkSize) {
        var that = this;
        return Array(Math.ceil(that.length/chunkSize)).fill().map(function(_,i){
            return that.slice(i*chunkSize,i*chunkSize+chunkSize);
        });
    }
});

Although slightly more confusing than just writing an iterative function, we create a custom property on the array object so we can keep our existing logic of applying functions on structures. This has the double benefit of allowing us to hide the gruesome implementation details of chunking an array, as well as to really neatly define the steps to construct genresPromises within our promise.

On the returned array of arrays, we map each array to the results of calling getArtistGenreFromArtists with that array. Again, remember that getArtistGenreFromArtists returns a promise, so the final type of genresPromises is … an array of promises! We can now call the super helpful function Promise.all() with our array of promises, which won’t resolve until every single promise in that array has been resolved. Look back to our function definition for getArtistGenreFromArtists - notice how a dictionary mapping genres to frequencies is returned, once the promise resolves? Once all the promises resolve, we are left with a list of dictionaries, and while each dictionary cannot have duplicate genres6, there may be duplicates between two dictionaries. Because we want to end up with one dictionary containing all the genres and their aggregated frequenies, we need to flatten our array.

The Final Promise: Flattening our array of dicts

Finally, to flatten our array, we use a good ol’ reduce function. For every dictionary in genreDict, we iterate through the genres, and check whether our “reduced dictionary” (here, the variable result) contains that key. If it does exist, we add currentObject[key] to the frequency we already have; if it doesn’t exist, we add currentObject[key] to 0. In the end, we are left with just result, which becomes our flattened dictionary.

1
2
3
4
5
6
7
8
9
10
11
12
then(function(genreDict) {
  var flattenedGenreDict = genreDict.reduce((result, currentObject) => {
    for(var key in currentObject) {
        if (currentObject.hasOwnProperty(key)) {
          result[key] = (result[key] || 0) + currentObject[key];
        }
    }
    return result;
  }, {});

  return flattenedGenreDict;
});

Some final words

And that’s it! On top of the code above, the javascript on this page contains some error handling and a lot of code for generating and displaying the table of genres, after a dictionary of genres->frequencies is generated. Hopefully this walk-through and demo has been informative, either because you found your favourite Spotify genres interesting or because you learned a bit about JavaScript along the way (or both!). Until next time!


Footnotes

  1. And, as of late 2017, I’ve switched to Tidal! If you’re thinking of switching from Spotify to Tidal, as well, consider using my Spotify to Tidal converter to import all your Spotify playlists and songs into Tidal. 

  2. It is unclear to me whether Glenn works at Spotify and develops Every Noise At Once as part of his work there or whether it was acquired by Spotify, but regardless, each extremely specific genre has a dedicated Spotify playlist. 

  3. Much has been written about promises on the internet, and I’m neither knowledgeable enough nor, frankly, willing to contribute. Promises, and the idea of promise chaining, took me an extremely long amount of time to understand, and as far as I know, I really have only scratched the surface. If you, like I did, find yourself desperately searching the internet for that one article that will contribute to your promises “a-ha!” moment, this website might be useful. It really cleared up a lot of my misconceptions, and I reference it often. 

  4. The fetch API is good and I refuse to use XMLHttpRequest. This is a hill I am willing to die on. 

  5. For some more commentary on the problem of “converting sets into arrays in javascript”, see this StackOverflow answer

  6. Because dictionaries are hashmaps. 

Automating Jekyll Site Deployment with Git Webhooks no more posts