#kpop Analysis PART 2: What can we learn about TWICE's line succession networks? #DataViz.

Get the data in Kaggle here

This is Part 2 of a 2-part data analysis post. To get to know how the dataset was created and what line succession means, please check out Part 1here



UPDATE (June 2021): I added TWICE's newest single, Alcohol Free (2021), in the visualization. The ALL SONGS network remains configured using their past Korean singles from 2015-2020.




In Part 1 of this project, I have shown you how to visualize line successions–how song lines are transferred from one member to another–using an interactive chord diagram. What else could we make out of the dataset? Could we say something more about the structure of TWICE songs?


Below is the completed visualization for the second part of this data science project. This is an interactive network of TWICE’s line successions in their Korean singles.

I invite you to fiddle with the controls and get to know how it represents each TWICE song.




Let me take you through its most important parts.

What are networks?#

Its often said that everything in the world is connected. If we observe keenly and long enough, we could form a web of relationships or networks of basically any set of objects we could find.

A simple network represented by a graph

A simple network represented by a graph

In mathematics, networks are represented as a graph. The objects are referred to as nodes (the circles) and any two nodes are connected by an edge (the lines) if they are somehow related to each other.

Very simple right? It turns out that by using these 2 simple concepts, we could recreate very complex structures from the real world. The field of network science studies these relationship structures and had already found numerous applications in many disciplines like computer science (neural networks, internet science), biology (epidemiology), chemistry (molecular stability), economics (trade, political dynasties), sociology (group dynamics) and a lot more!

What are the properties of the line succession network?#

Here I have applied network science to look for patterns in TWICE’s songs. I call the network I have generated the line succession network.

What are some of its properties?

  1. It is a directed network. The edges that connect any two members have a direction as indicated by the arrow. For example, the presence of an arrow going from Sana node to Jihyo node means that Sana transfers the line to Jihyo (i.e. Jihyo sings right after Sana) at least once during the song.

In contrast, the chord diagram in Part 1 of this work is actually an example of an undirected network, since we did not encode there the direction of the succession (i.e. Jihyo->Sana is the same as Sana->Jihyo)

  1. Edges pointing toward a node could be thought of as inflows, while those from the node pointing outward are the outflows. For line succession networks, since members must transfer a line to another, there must be an outflow edge for every inflow edge in general , except for the member starting/ending the song, which would then have 1 less inflow/outflow edge. In the chart above, this isn’t always seen because edges are allowed to overlap with each other.

  2. It is a connected network. A network is said to be connected if you could reach any node from any other node by tracing the edges in between them. This is true since all members have parts and all parts are connected within the song.

What can we say about TWICE’s line succession networks?#

We have generated line succession networks for each of the 14 TWICE Korean singles so far, plus one that serves as an overall fundamental pattern for all these songs–the ALL SONGS line succession network.

1. The ALL SONGS line succession network

The ALL SONGS network represents the top 30% most frequently occurring line succession pairs in TWICE’s Korean singles from 2015-2020. This network contains line succession pairs that have occurred for at least 8 times over all TWICE’s Korean songs.

Why 8, you may ask? This threshold could be obtained by looking at the distribution of succession pair counts:


TWICE line succession pairs occurrences across their Korean singles from 2015-2020

TWICE line succession pairs occurrences across their Korean singles from 2015-2020


The histogram peak shows that most of the $ {}_9 \mathrm{ P }_2 = $ 72 possible succession pairs have occurred for 5 to 8 times over all the songs considered. The frequency starts to decline at count = 8, which also turns out to be the 70th percentile of the distribution. This natural breakpoint was used to define the line succession pairs that were included in the ALL SONGS network.

For those who are familiar with TWICE, a good look at the ALL SONGS network would reveal that the members are actually clustered according to the role they usually play in the song, or their position.

The ALL SONGS network detecting TWICE member positions in the group

The ALL SONGS network detecting TWICE member positions in the group

What are these positions and how is this shown in the ALL SONGS network?

  1. Subvocals/Bridge group - This group is composed of Sana, Tzuyu, and Mina, who most often sing the song’s bridges and verse lines. I actually expected these 3 to have edges to each other (i.e. a fully-connected subgraph), but the Sana->Mina did not make it to the cut because it only had just 7 occurrences.

  2. Main vocal/Chorus group - The power vocalists Nayeon, Jihyo and Jeongyeon usually sing the chorus and form this tight group that lies at the center of the network.

  3. Prechorus group - Mina and Momo often sing before the chorus, or even after the chorus for the song’s hook lines. These two are the non-main vocalists that have most edges to the Main vocal/chorus group

  4. Rap group - Lastly, the group at the far right is composed of the rappers Chaeyoung and Dahyun. Interestingly, according to the ALL SONGS network, the rap group is triggered most often after a Nayeon line. Moreover, following the network, if Dahyun ends the rap, she would most likely turn over the next part to Jihyo, while if Chaeyoung ends the rap, it would be Nayeon who will sing next.

I’d like to point out that these groupings emerged in the network without either labelling the members with their positions nor tagging the lines to song parts. These line succession pairs just repeat so frequently in the dataset that these expected patterns were bound to appear.

In fact, you may try to create a “hypothetical” TWICE song with the ALL SONGS network by starting with one member node and tracing out a sequence of members by following the edges that connect them!

2. Line succession networks of each individual song

We can better appreciate the diversity of network structures when we look at each of their individual songs.



Line succession networks for (a) Feel Special (2019) and (b) Cheer Up (2017)

Line succession networks for (a) Feel Special (2019) and (b) Cheer Up (2017)



For instance, the regular-blocked verses of the fairly mellow song Feel Special resulted to a simple network that has sparse edges in between the members. In contrast, the rapid exchanges of lines in the energetic song Heart Shaker produced a densely-connected network that contained some of the rarest succession pairs (e.g. Sana->Jeongyeon, Chaeyoung->Momo).

But despite the differences, both songs still contain edges from the ALL SONGS network. To be exact, about a third to half of all edges in each of TWICE Korean singles are from the ALL SONGS network!

I have mentioned in Part 1 of this blog that basing a member’s “relevance” just from her total line duration is too simplistic and omits effects that are dependent on how the song is structured. Now that we have these network representations, could we somehow measure a member’s relevance accounting for this complexity?

How can we measure relevance in a network?#

Oftentimes, one of the goals of analyzing a network is to identify nodes that “play” a bigger role and are therefore more important compared to other nodes. A network’s centrality measure tries to quantify this.

There are various centrality measures developed to measure different nuances of “importance”, but we only used 2 kinds here in this post: betwenness centrality and eigenvector centrality.

As defined by Zinoviev (2018):

  • A node’s betweenness centrality is the the fraction of all possible shortest paths that pass through it. If the betweenness is high, the node functions as a go-between (thus the name) and could serve to connect any two nodes most efficiently. The removal of such a node would greatly disrupt the flow and possibly split the network into disconnected components.

  • A node’s eigenvector centrality is less straightforward to define mathematically (we might want to review our linear algebra first!). Conceptually, it quantifies the importance of a node by looking at the importance of its neighbors, i.e. nodes directly connected to it. High eigenvector centrality nodes are those that are surrounded by other nodes with high eigenvector centrality–almost a nod to the saying “Tell me who your friends are, and I will tell you who you are.”

How can we interpret centrality metrics computed from the line succession networks?#

These are how the 2 centralities are defined in general, but how do we interpret centrality measures in the context of line succession networks?



1. The betweenness centrality of a member measures the extent of her connective role in the song, i.e. a high betweenness member would precede and succeed the most number of members, serving as a bridge to bind the song together.


Lets take the simplest song network Feel Special as an example to illustrate this.

  • Jihyo comes 1st in betweenness centrality because she occupies the middlemost position in the network and thus gets often crossed when paths are drawn to connect any two other members. Her role is central enough that when her node is removed, 2 members, Jeongyeon and Dahyun, would be completely disconnected from the network.

  • On the other hand, Chaeyoung occurs at last place and actually has a betweenness centrality value of 0. This is because her only line is at the beginning of the song and does not serve to connect any 2 members aside from the member next to her (Tzuyu).

    Line succession network for Feel Special (2019)

    Line succession network for Feel Special (2019)


2. The eigenvector centrality of a member indicates whether she has a dominant role, which would mean she directly precedes/succeeds other central members or an accent role which would mean she is less connected but would serve as some sort of atypical position to "spice up" the song.



Again, using Feel Special:

  • Nayeon has top eigenvector centrality because she has 2 inflows from relatively high eigenvector centrality members Jihyo (2nd) and Sana (6th) and just 1 distinct (but repeated) outflow to Mina (4th). She does serve a dominant role in this song since these edges were from those who precede her chorus lines.

  • Meanwhile, Chaeyoung only has one edge, and that edge is an outflow. Since no one connects towards her, her eigenvector centrality is 0. Assigning her to open the song is quite rare and could likely put there as an accent to highlight the first verse.

What can we say about the centrality metrics of each member?#

We could look at the range of each of the members centrality metrics across all their Korean singles.

1. Betweenness Centrality


Betweenness centrality distributions of each TWICE member across all their Korean singles. The white box locates the median of the distribution

Betweenness centrality distributions of each TWICE member across all their Korean singles. The white box locates the median of the distribution

  • From the betweenness centrality distribution above, we could see that Jihyo, Nayeon and Mina has the most connective role for all of their Korean singles so far. These 3 also have the most lines, giving them more opportunities to go in-between any 2 other member parts.

  • Tzuyu occuring at 4th place here is quite a suprise despite the fact that her total line duration is less than half of Nayeon’s and Jihyo’s, and a fifth less than Mina’s.

  • Momo and the rap group Chaeyoung and Dahyun, share almost similar metric medians to Tzuyu, but with noticeably more left-skewed distributions.

  • Lastly, Sana and Jeongyeon have the least betweenness centrality values and thus least connective roles for almost all the songs.

2. Eigenvector Centrality


Eigenvector centrality distributions of each TWICE member across all their Korean singles. The white box locates the median of the distribution

Eigenvector centrality distributions of each TWICE member across all their Korean singles. The white box locates the median of the distribution

  • According to their eigenvector centrality measures, we could see again that Jihyo, Nayeon and Mina had the most dominant roles in their songs.

  • Suprisingly, Sana now comes at 4th place for this metric. This is due to her often preceding/coming next to the parts of the more dominant Nayeon, Jihyo or Mina, although her connections isnt that many to also merit a connective role.

  • Momo, Dahyun, Tzuyu and Chaeyoung have similar distribution medians at the middle of the pack. Chaeyoung, in particular, has a bimodal distribution shape, which indicates that she serves dominant roles for a few songs.

  • Lastly, Jeongyeon ranks the least in eigenvector centrality despite ranking 4th in total line duration. As Jeongyeon’s voice is quite rich and strong, she is often used to accent a chorus line and thus would sparingly connect to other members, dominant or not.

What song did each member top in terms of centrality?#

We could also look at each member’s best metrics and try to validate as a listener whether she did serve a relevant role in that particular song.


TWICE member’s top songs, according to their centrality measures

Member eig_c (rank) Top Song btwn_c (rank) Top Song
Nayeon 0.563 (1st) What Is Love 0.411 (3rd) Cheer Up
Jeongyeon 0.226 (1st) Yes Or Yes 0.410 (4th) Signal
Momo 0.411 (2nd) Feel Special 0.469 (2nd) Feel Special
Sana 0.232 (3rd) TT 0.445 (2nd) Fancy
Jihyo 0.580 (1st) Like Ooh Ahh 0.591 (1st) Like Ooh Ahh
Mina 0.509 (1st) Cheer Up 0.486 (1st) Yes Or Yes
Dahyun 0.257 (2nd) Cry For Me 0.469 (2nd) TT
Chaeyoung 0.314 (2nd) Fancy 0.521 (1st) Fancy
Tzuyu 0.339 (5th) Feel Special 0.488 (1st) Heart Shaker
  • Sometimes, it all boils down to the number (not duration!) of lines in the song. For instance, Chaeyoung’s highest ranked song for both metrics is Fancy because she delivered the repeating iconic hook line “Fancy, ooh!” there. Meanwhile, Jihyo’s highest rank song for both metrics is their debut single, Like Ooh Ahh, where she took the most lines. In a similar manner, Nayeon also had her most dominant role in What is Love and Cheer Up.

  • The tempo and beat style of the song part also matters. Dahyun’s best metrics occur during her fast raps in Cry For Me and iconic “neomuhae” hook in TT. Jeongyeon also had relatively faster and more back-and-forth parts in Signal and Yes Or Yes.

  • But song part placement is also important. In particular, prechorus lines often translate to dominant and connective roles. Momo played her most dominant and connective role for her prechorus lines in Feel Special (the only part that repeats aside from the chorus). Tzuyu also had her most dominant song in Feel Special, as she was also singing the prechorus lines there with Momo, and served her most connective role in Heart Shaker‘s prechorus (and chorus too). The same can be said for Mina in Cheer Up and Yes Or Yes.

In summary#

We have shown that a song performed by a group could be represented as a graph through the line succession network.

As the network that represents the most frequently used succession patterns of a group, the ALL SONGS network show most distinctive parts of a TWICE song, and around a third to half of the edges that comprise it persists in all TWICE songs so far.

The betweenness and eigenvector centrality metrics, when applied in line succession networks, can be thought of as a measure of the strength of a member’s connective and dominant/accent roles, respectively.

Although members with longer lines definitely have an advantage, line duration does not correspond with higher centrality metrics in a song. As we have seen, it all depends on the structure of the song’s network.

Where can I take this idea further?#

The same techniques can be applied when you replace the TWICE songs with basically any set of objects which relationships you could map out in a network (e.g. e-commerce products, protein sequences, infection spreaders, etc)

Still, I often wonder why seldom see directed networks in use in data science projects, which is understandable since ordered relations are quite tricky to find. But as we have done for the case of TWICE songs, it could be as simple as encoding consecutive occurrence. This is important because direction captures what order people arrange things in their minds, and that in itself, is valuable information!

For those who want to do a similar analysis using networks, I invite to think of ways to encode direction in creating your network’s edges–you may be suprised with the results it might give you.



Thanks and see you again in the next blog!

References

Zinoviev, D. (2018). Understanding Social Networks. In Complex network analysis in Python: Recognize-construct-visualize-analyze-interpret (pp. 57-58). Pragmatic Bookshelf.