The Data Daily

How Spotify know a lot about you using machine learning and AI.

Last updated: 12-02-2019

Read original article here

How Spotify know a lot about you using machine learning and AI.

Spotify is a music streaming industry started in 2006. It got its first official launch in India in February 2019 and it already had millions of subscribers in its list. Spotify is known for its user experience, music recommendation and it is continuously getting improved. It uses artificial intelligence, machine learning, and big data to improve and personalize the music experience for its listeners.

Spotify needs no introduction. Spotify is one of the best music streaming industry in the market. But what excites us the most is the amazing ways it uses to enhance the user experience.

We all would be familiar with "discover weekly" which is a personalized playlist unique to each user. It is using artificial intelligence and machine learning algorithms to generates the playlist. It learns through your music preferences, streaming history or how many times you listened to a particular song. Everyone's discovers weekly is different at different times of the day.

When you are listening to music, Spotify will monitor whether you are listening to the whole song or just skipping through it. And over time it builds up and understands the type of music you like. They even dissect this type of music by beats per minute and style the type of voices and so on. So this helps users who don't have time, energy or skills to create their own playlist getting the playlist according to their interest.

The more you listen to the music the more data they get about you and the better their algorithm becomes of your kind of music and hence taking them on a personal listening journey.

In the further section, we will discuss the in-depth working of this system.

There was one problem in the traditional music industry of the past and that was that new creators had to go through a lot of struggle to reach the audience, even if they create the music that people will like. Spotify's music recommendation system works on machine learning that learns about your song type and it predicts and recommends you a new song that you probably haven't listened but you will like.

This gives a chance to music creators to get known by the people and listeners to get songs they will like. This makes happy both listeners and creators and especially help creators to become the best version of themselves. They don't have to go through hurdles to get recognized and they can focus on creating music.

Firstly Spotify tries to collect as much data as it can and tries to make sense of it in different ways. It creates many shared models representing the data and is used as many different applications.

And some of them are discussed below:

They have millions of playlists and they filter out the playlist that is relevant for the training. Selection is an important factor here because if you train on all available playlist it will definitely not give better results.

So what it does is it removes the song from a particular playlist and then try to guess which track is missing in using the context of other playlists. It uses the Word2Vec type algorithm.

Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located close to one another in the space.

Out of that, they get a cloud of similarities between playlists, tracks, and artists and try to map that how these artist's music types are close to each other or how this album is close to a particular listener's music taste.

2. Spotify Home screen: Spotify Home screen uses machine learning algorithm known as BaRT.

Basically BaRT is a Bayesian Additive Regression Trees which is a Bayesian “sum-of-trees” model where each tree is constrained by a regularization prior to being a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior.

In Spotify, BaRT is used to predict the wide range of different shelves and shelf could be made for you or recommendations related to recent listening history.

How it gives a personalized experience?

BaRT algorithm work in a very interesting way to know about its user.

Whenever users search about a query it categories its searches in a different manner like search item popularity, whether the user has searched about this item before, similarity of the item to the user taste and the distance between prefix query and the matched items. The ranking model trained on search interaction logs and use search sessions that end in success action as positive examples. And all these predictions happen in just milliseconds.

So how ranking algorithm gets its data?

It basically takes two things into account and gives the score on that basis.

Loss Functions used in this system is LISTWISE FUNCTION.

The listwise approach addresses the ranking problem in the following way. In learning, it takes ranked lists of objects (e.g., ranked lists of documents in IR) as instances and trains a ranking function through the minimization of a listwise loss function defined on the predicted list and the ground truth list. The listwise approach captures the ranking problems, particularly those in IR in a conceptually more natural way than previous work.

And Training Model used is Lambda Mart with Maximizing NDCG(average over training dataset) using GBDT (Gradient Boosting decision trees.)

Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical score(4 on success item, 2 on the related item and 0 on everything else) for each item. The ranking model's purpose is to rank, i.e. produce a permutation of items in new, unseen lists in a way that is "similar" to rankings in the training data in some sense.

Spotify describes its successful implementation of machine learning in a Hyperight AB keynote in the following three ways.

In a video of Mr. Bernard Marr he provided the information when he met with the data scientist team of Spotify, they revealed

Read the rest of this article here