How many streams does a track need to top the PH Spotify charts? # DataViz.
Fork this project in Github
here
Get the data in Kaggle
here
Ever since I was doing my masters back in 2016, I have used the app Spotify for my daily music needs (this is not sponsored! haha). Spotify, as it turns out, hosts the streams data of the songs that make in their charts and made a Web API to get data on any of the songs and artists listed in their platform.
This is very interesting for me so I downloaded Spotify’s available chart data over the past months. I have been using it for teaching data science and have been asking students to make projects out of it— and I thought, maybe I should also give it a try?
In this post, I analyzed the Daily Spotify Top 200 charts in the Philippines and get to know what it takes to take the chart’s top position.
My answer to this is shown below. To get the Top 1 position, the track must get streams equal to or higher than goal set by the green line I called G₁, defined as 1 stream more than the most recent streams obtained by the Top 2 position.
I invite you to hover on the line chart to check out the top track and the goal setting track for a particular day, and fiddle with the controls to focus on a period of interest.
What can we say about the charts?#
Here are some of the insights from the data and interactive chart:
• The goal streams G₁ needed to get Top 1 has shown more variance in the past 2 years
-
Based on G₁, tracks need relatively higher streams during first half of 2017 compared to most other data points. This is likely because fewer and only more famous artists are in the platform at that time.
-
On 2nd half of 2017 up to more than 1st half of 2019, we now see a lower and fairly constant goal as more artists enter and take a share of the total market streams.
-
During Q4 2018 up to Q1 2020, we see that the G₁ almost increased by 30% that of previous years, and has roughly stayed on that level.
-
We see this growth halted during Mar-May 2020, around the same time during the first nationwide strict lockdowns. G₁ was reduced to its level in 2018. Strange, because you’ll expect people to listen to more music during the lockdown, right? The connection of the music streaming market to the pandemic still needs to be studied!
-
Come Jun 2020, G₁ became less stable and noted many record highs compared to past years, which coincides with song releases of very famous artists like BTS, Taylor Swift, BLACKPINK, Olivia Rodrigo, and Arthur Nery.
• BTS, Ed Sheeran, Maroon 5 hold record for most Top 1 days
The table below lists the tracks that recorded the most days as Top 1 per year in the PH Spotify charts.
-
BTS currently holds the record for the most number of Top 1 days at 124 days for 2017-2021. Highlighting their achievements further, they have accumulated enough to set this record starting only on 2020 when they first appeared in this table.
-
The most number of Top 1 days in a year is held by Ed Sheeran in 2017, garnering 98/365 days = 26.8% of all the calendar days.
-
2018 had the most number of OPM artists in this table, with December Avenue at 4th, Juan Karlos at 3rd, and IV of Spades getting the top spot, beating big artists like Drake and Ariana Grande.
• Pre 2020 tracks hold longest Top 1 streak
The table below lists the tracks that recorded the longest streak as Top 1 in the PH Spotify charts.
-
The top 5 longest streaks kept the Top 1 position for more than 1 month, with the top streaking, Luis Fonsi - Despacito staying in the top position for almost 3 months in 2017!
-
Most of the tracks that have kept the Top 1 position for at least a month were released in Spotify during 2019 or older, when the market had less participants.
-
We also see more recent viral hits like BTS - Butter, Olivia Rodrigo- drivers license and The Kid LAROI - STAY (with Justin Bieber) top the charts and streak just after they are released.
• More landslide wins occured in 2021 than past years
Depeding on the market conditions, a track can win the Top 1 by a huge or small margin from the Top 2 track.
If we define a landslide win to be the case when the Top 1 won by > 30% and conversely, tight match when it won by < 5%, and count these win types per year, we will arrive at the following chart:
Frequency of landslide and tight battles for the top 1 position in the PH Spotify market
- 2021 has the most lanslide wins and least tight matches out of all years, which could be due to the big, viral releases we have seen in the Top 1 streak table. The same trend is present in 2019, and to a lesser extent, 2017.
- On the other hand 2020 and 2018 saw the opposite trend where more Top 1 tracks won by a small margin.
We can generalize this further by not limiting to these two categories and looking at the full distribution of the ratio of the daily Top 1 to Top 2 streams per year:
Yearly distribution of Top 1 to Top 2 streams ratio in the PH Spotify market. The square marker represents the median of the distribution; the circle, the mean; and the short vertical line, the 25th and 75th percentile.
- Based on the distributions, it seems like most of the Top 1 songs had a 10%-20% streams lead compared to Top 2 songs for 2017-2020.
- We can also see the relatively more frequent landslide wins (ratio > 1.3) during 2017 and 2019 at the minor peaks of the distribution over the ratio range 1.5-1.7.
- Interestingly, 2021 had this bimodal, tail-heavy distribution that is different from the other years. The first peak corresponds to Top 1 tracks with a tighter ~10% lead, while the second is a wider one that lead by a larger margin of ~30-60%.
Landslide wins
To show examples of landslide wins, here are 10 Top 1 tracks that had the largest lead with the Top 2 track:
These tracks really dominated the charts with at least double the streams of the Top 2 track for that day.
- Only 2 artists made this list, Arthur Nery and BTS, and all these tracks were released and charted during 2020-2021.
Tight matches
On the other hand, below are 10 Top 1 tracks that had the tightest lead with the Top 2 track:
These tracks beat the Top 2 track only by a few hundreds of streams (<1%).
-
3/10 of these entries are Ariana Grande- thank u next and juan karlos- Buwan swapping positions in the charts with such tight margins.
-
Also in this list were competitions between international artists (e.g. The Kid LAROI vs LISA), while others were from OPM artists (e.g. Ben&Ben vs Matthaios).
But first—how was the target metric G₁ defined?#
To achieve the Top 1 position in the chart, you will need to garner at least 1 stream more than the streams garnered by the Top 2 position.
BUT—getting the actual target goal streams is not possible in real-time because Spotify only releases the charts data 2 days from the present day. Thus, to address this latency, we need to estimate it using streams from Top 2 positions of the days prior.
Here is how we defined the metric G₁:
DEFINITION:,The target goal streams to achieve the Spotify Top 1 position G₁ is the record maximum streams achieved by all the Top 2 tracks for the past 1 week, plus one more stream.
Below is a working example of computing G₁ for target date 2018-09-22:
- According to the goal definition, we look back in the data to get the streams for tracks that got the Top 2 position for the past 7 days (2018-09-15 to 2018-09-21)
- Among those in the list, we get the Top 2 track with the most streams. In this case, it is Ben&Ben - Kathang Isip on 2018-09-19, which got 152797 streams.
- To achieve the top spot, the estimated streams needed to beat it is 1 more than the maximum top 2 streams for the past 7 days. Thus, G₁ = 152797 + 1 = 152798.
- CHECK: Indeed for 2018-09-22 the track which got the top position, December Avenue-Kung ‘Di Rin Lang Ikaw, has exceeded G₁, with 174111 streams.
When data is already available, we can simulate this lookback process for G₁ very easily by using pandas
's rolling().max()
method.
How reliable is G₁ as a streams target for the Top 1 chart position?#
Being the skeptic that I am, I put this metric to test by analyzing how it performs as a target for the Top 1 position. Here is the same ratio distribution plot, but this time I compare the Top 1 vs G₁ streams.
Yearly distribution of Top 1 to G₁ streams ratio in the PH Spotify market. The square marker represents the median of the distribution; the circle, the mean; and the short vertical line, the 25th and 75th percentile.
It turns out, taken all the years, some 24.7% of the Top 1 tracks got streams which was lower than G₁!
What is the reason for this? It is because for these days, G₁ can also overestimate the required streams to Top 1 tracks, because the most streamed Top 2 track for the previous week was too high—high enough to get it the Top 1 position had it performed the same this week.
Here are some examples of songs that achieved Top 1 but didn’t exceed G₁—achieving only half of it in fact!
Here we have some very interesting cases:
-
6/10 of the entries in the table above refers to the G₁ streams based from Ariana Grande - Santa Tell Me right before the Christmas week. The tracks that got Top 1 without beating the goal based on this track are either (1) another Christmas song (e.g. Mariah Carey - All I Want for Christmas Is You) or (2) another one of Ariana’s songs, 34+35. This case highlights the effect of holidays in our collective listening habits.
-
Sometimes, a track can set the G₁ streams goal, only reach half of it, and yet still achieve the Top 1 position. This is the case for track Taylor Swift - Cardigan during Jul 29-30, 2020. This happens during the rare case when the latest Top 2 track loses streams faster than the Top 1 track.
Error analysis
This 24.7% of all Top 1 tracks that bypasses the goal set by G₁ may seem to dent its purpose, but when we look at the cummulative proportion of the streams ratio of these tracks in the plot below:
Cummulative proportion of the Top 1 to G₁ streams ratio for Top 1 tracks whose streams were below G₁
we see that only 20% of this 25% (i.e. 5% of all Top 1 tracks) are below 10% of the G₁ for that day. Thus, if we allow a 10% margin for G₁, it accurately predicted the Top 1 streams goal by about 95% of all days considered. This is quite satisfactory for me.
Is 7 days good enough as a lookback period for G₁?#
The window size used to take in historical data is crucial in any forecasting exercise, so I tested a lookback period ranging from 1 to 90 days from the target date.
I plot the proportion of Top 1 tracks with streams above G₁ below:
Proportion of Top 1 tracks greater than G₁ as function of the lookback period length in days
A lookback period of <3 days practically guarantees the G₁ metric’s success with accuracy of >98%. However, a lookback peirod this short might not work when deployed because of Spotify’s chart data latency. The accuracy drops relatively steeply until around 25-30 days and continues to decrease monotonically but to a less extent.
I could have chosen 4-6 days but ended up picking a slightly larger period of 7 days because of the stronger smoothing of the G₁ curve (makes it look neater!), and also the practicality of just looking at a week’s worth of data.
Where can we take this further?#
• G metric for each chart position
If you sensed where I was headed by my variable format for the Top 1 streams goal, G₁, this can be generalized to set a goal for any chart position i, Gᵢ (hence, the subscript!).
For example, G₂ is the goal for the Top 2 position, which will be based on the streams of the most recent Top 3 position, and so on.
• Exploring streams inflation
You might also be reminded of FOREX/stocks charts when you looked at the Spotify streams chart above. I took note of that as a simple resemblance—but then it hit me that streams can also be thought of as a currency in the music streaming market.
Streams decide which position the track will reach in the charts, and like real currency, the ability of a stream to take a track to a position—its position “purchasing power”—also changes with time.
To illustrate this, the streams in 2017 that would have gotten you to the Top 1 position may not get the track to the same position today.
Moreover, a Top 1 position achieved during days when very famous artists are competing is not the same as a Top 1 position on days when there are no new releases from them.
This raises the question, could there be some of inflation index that we can define to determine the relative value of a stream compared to a basis period? And can this index be used to correct the current streams value to uncover the real growth of a track in the streaming market?
I will explore this idea in Part 2 of this topic.
Where did the data come from?#
For this analysis, I used the data I collected from Spotify charts.
This is relatively lighter data gathering compared to my previous posts. I purposefully did not get any more data for this to train myself to extract insights as rich as possible from limited data.
What are the tools I used?#
I used pandas
to read and analyze the dataset, matplotlib
and seaborn
to make some basic plots, and the line chart in bokeh
to make the final interactive chart.
I invite you to look at and try out the jupyter notebook in this Github repo.
Thanks so much for reading!