Researchers at YouTube Music have developed a new approach to personalized music recommendations using Transformer models, a type of artificial intelligence previously used for language translation and classification tasks. The team, led by colleagues including Reza Mirghaderi, Li Yang, Chieh Lo, Jungkhun Byun, Gergo Varady, and Sally Goldman, applied the Transformer architecture to encode user actions on YouTube Music, such as skipping or selecting music tracks.
This allows the model to capture the relationship between user actions and music context, leading to more accurate recommendations. The approach combines a Transformer with an existing ranking model to learn the combined ranking that best blends user actions with listening history. Offline analysis and live experiments demonstrate that this approach significantly improves the performance of the ranking model, reducing skip rates and increasing time users spend listening to music. This breakthrough has the potential to revolutionize personalized music recommendations on YouTube Music.
The key insight here is that self-attention layers in transformers can capture relationships between user actions, just as they do with words in a sentence. By applying attention weights to user actions, the model can differentiate between important and less important actions, depending on the context. For instance, when a user is listening to music at the gym, skip actions might receive lower attention weights, while actions taken during other activities might receive more attention.
The proposed architecture combines a transformer with an existing ranking model to learn a combined ranking that blends user actions with listening history. The inputs to the transformer include signals describing user actions, such as intention, salience, and metadata, which are projected into vectors and combined with music track embeddings. The output vector from the transformer is then combined with the existing ranking model inputs using a multi-layer neural network.
The results of offline analysis and live experiments demonstrate that this approach significantly improves the performance of the ranking model, leading to reduced skip rates and increased session lengths. These metrics indicate improved user satisfaction with YouTube Music recommendations.
Future work opportunities include adapting this technique to other parts of the recommendation system, such as retrieval models, and incorporating non-sequential features within the transformer architecture. This could allow for improved self-attention between sequential features like user actions and non-sequential features like artist popularity or music genre.
Overall, this article showcases the potential of transformers in ranking models, particularly when it comes to encoding complex user behavior. By leveraging attention mechanisms, these models can better capture the nuances of user interactions, leading to more personalized and engaging recommendations.
External Link: Click Here For More
