Evaluating Lyrics using NLP
“Artificial Intelligence is whatever hasn’t been done yet.”
- Larry Tesler
Introduction
Aren’t songs just words with some sort of a background music?
Songwriting is an art that intrigues me. To an inexperienced person, it might seem as simple as jotting down a few lines about a topic, and adding a track to go along. But it is far from that. It involves a combination of interlinked topics to produce a melodic symphony. Unsurprisingly, computers often struggle while interpreting the intention of a given song, and its overall context.
However, with the increased development of Natural Language Processing (NLP), computers are beginning to understand human language, and an effective application of this is song lyric analysis. Codecademy has an article in which Taylor Swift’s song themes are compared over the years, using only the lyrics. I found the idea of such a model quite interesting and decided to give it a try myself. I tried to implement the model on my laptop, but failed. Luckily, Codecademy had a similar tutorial on YouTube, and with a few edits to the code, I was able to set up the model myself.
Analysis of Taylor Swift’s Lyrics
I coded the model using the Kaggle dataset of Taylor Swift’s lyrics. The model calculated her analytics using 6 prevalent themes in her songs namely, love, memories, breakups, party, homesick, and independence. The algorithm is relatively simple, each of her songs is rated on the 6 themes from a scale of 0 to 1.5. More number of words pertaining to a theme in a song yields a higher rating. For example:
I was quite satisfied with the evaluation of the model, and proceeded to plot the values on a graph:
However, it soon came to my knowledge that this graph was far from accurate. Apparently, the words relating to the topics of love and party were being considered stopwords (stopwords are words that are filtered out to prevent data manipulation as they are extremely common in the English language). After fixing the issue of the stopwords, there was significant improvement:
Analysis of Sara Kays’ Lyrics
I was frankly quite happy with the model’s performance with Taylor Swift’s lyrics. However, I wanted to see if this model would work well with Sara Kays. There are certain themes that are quite evident in her songs, and I was wondering whether the model would be able to pick them up. Unfortunately, Kaggle did not have a dataset for her songs, and so, I decided to make my own.
Preparing the Dataset
Preparing the dataset was easier than I expected it to be. Using Microsoft Excel, I converted the lyrics of the songs from genius.com, into a CSV file. This time, however, I intended to analyze the songs both as singles, and as EPs entirely. And so, I copied and edited the new dataset to get 2 desired sets of raw data.
Making the Model
Using a concept called tf-idf (term frequency-inverse document frequency), the model was able to identify features of each song that represent the importance of each word to the song. I then labelled a few themes that I felt were relevant to the songs:
Ratings were assigned to both songs, and the EPs as a whole, across the 5 parameters (themes):
Unlike the ratings on Taylor Swift’s lyrics, the numbers actually crossed 1, and some reached a value as high as 1.4. This indicates the usage of more vocabulary specific to the theme, or greater expression of the theme throughout the song.
Results
Analysis of Sara Kays’ Singles (2020–2021):
Analysis of Sara Kays’ EPs (2018–2021):
My Inference
In my opinion, the model did significantly better for lyrics of Sara Kays, than for those of Taylor Swift. A simple reason is that the dataset of Taylor’s lyrics had about 90 songs, whereas the dataset of Sara’s lyrics had about 20 songs. This could’ve caused the model to get confused while analyzing the former dataset, while the lesser number of songs in the latter dataset could’ve helped the model gain some clarity in its approach. Thank you for reading!
Helpful Links:
Kaggle Dataset
https://www.kaggle.com/PromptCloudHQ/taylor-swift-song-lyrics-from-all-the-albums
Codecademy — Using Machine Learning to Analyze Taylor Swift’s Lyrics
https://www.codecademy.com/resources/blog/taylor-swift-lyrics-machine-learning/
Codecademy — Analyze song lyrics with Python