Monday, August 10, 2015

Degree of separation of Sri Lankan Twitter community


I recently moved to Arch Linux :D While backing up old project files I came across some of the experiments[1] [2] I’ve done on SNA. Thought I’d document at least some of the stuff before I forget everything :D
Inspired by ErdÅ‘s number and Bacon number I wanted to calculate the degree of separation of Sri Lankan Twitter community.
Toughest and the most epa karapu part was the data collection phase. What I wanted was a graph of Sri Lankans with nodes as users and edges as friends. Using Twecoll on Bestatlk twitter account its followers and followers of followers were scraped. Whole process spanned over an entire week due to Twitter’s API limitations. At the end a gml file was generated.
I then ran the gml file through a python script to extract twitter ids and generate a dictionary with keys as user ids and values as a list of twitter ids of the friends of the key.
Next step was to find the shortest path between all the nodes in the graph and get the maximum. Basically find the degree of separation.
First attempt was to apply breadth-first search to every 2 nodes. Needless to say that was extremely slow. Then I found about Floyd-Warshall algorithm which is capable of finding shortest paths between all the nodes in the graph. It was also found that networkx has a built in method to calculate it :D So as a perfectly normal lazy human being I ended up using networkx instead of implementing the algorithm myself.
Degree of separation was found to be 5.
In other words if you’re a Sri Lankan and you’re on twitter any other Sri Lankan on twitter is either a friend of a friend of a friend of a friend of a friend or less.
Disclaimer
  • Data were collected in January 2014 community should have grown since then.
  • Bestatlk doesn’t follow all the Sri Lankans on twitter so the results are not based on the entire community.
P.S. I also fed the data to some other algorithms in an attempt to find the most influential Sri Lankans on Twitter. I guess I’ll save that for another post. Or maybe not.

No comments:

Post a Comment