Dawn of Decentralization

Introduction

Social media is one of the cornerstones of people's lives these days. People use them to follow friends and famous people, see what they're up to, and share knowledge and entertainment.

Among them, one of the most prominent social media platforms is Twitter. A micro-blogging platform where people of all kinds share bite-sized updates about themselves or the things around them. It's so popular that even governments use it as one of the ways to connect to the public and disseminate information.

Sounds fine and dandy till now. But alas, not everything is perfect, and Twitter is no exception. Even with Twitter's massive scale, it still is under the control of a single organization, and hence, all data on the platform goes through them, and they have the final say on who stays on the platform and who gets to see what.

Of course, many of us don't prefer that, so a new social media network came along, called Mastodon. It claims to solve the issue of having all control with a single entity through federation.

What's federation, you ask? It is the model of control where multiple service providers interact with each other to provide a singular service, for example, email. Mastodon is a social network that uses federation to distribute the control of data and gives more freedom to the user.

That sounds quite appealing, but as usual, nothing should be taken at face value. Hence, our project aims to explore the differences between the centralized Twitter and decentralized Mastodon platforms so that the end-user would be more informed about which platform suits their needs better.

Problem Statement

To compare a centralised microblogging platform, Twitter, with a decentralised microblogging platform, Mastodon, in order to make end-users informed about their privacy aspects.

We have covered three aspects in this project: Data Privacy, Gender Recognition, and Location Privacy.

Data Privacy

In data privacy, we consider how much personally identifiable information we can extract for any end-user and if we can profile anything out of it.

For Twitter, we used the API in order to see what we can extract out of 100 user IDs and 4000 tweets.

For Mastodon, we used a bot to scrape data from 8 instances and 160 toots to see what information we could get from them.

Here are the pie charts we generated from the attributes that we could extract:

Tweet ID:

Toot ID:

Twitter User ID:

Mastodon Account ID:

At first glance, it looks like Mastodon collects a lot more data than Twitter does, judging simply by looking at the number of attributes that could be collected.

But, as Mastodon is federated and open-source, we could set up our own instance and check the admin interface to see what data the service provider sees:

From this, we could observe that Mastodon is actually a lot more transparent about what data it collects.

In comparison, we cannot make any statements regarding Twitter, as there's no way of accessing any admin interface, because only Twitter Inc. employees are allowed to do that.

Though on the flip side, it should be noted that Twitter has much more polished tools for accessing data, owing in major part to the age of the platform as compared to Mastodon.

Gender Recognition

In gender recognition, we consider whether an end-user's gender could be inferred and whether marketers can take advantage of it in order to promote their ads.

For Twitter, we collected 4000 random accounts and ran a random-forest classifier.

For Mastodon, we chose the most popular instance, koyu.space, and ran a Naive Bayes classifier on the accounts IDs we could collect.

We found that the accuracy of classification was comparable in both platforms. But it should be noted that Mastodon doesn't come with any advertising integration built-in, and it would require major restructuring on part of the instance owner in order to make advertising viable.

Location Privacy

In location privacy, we check whether it is possible to infer location of a user through their activity.

For Twitter, we collected a random 3298 accounts, and tried to use the API to get geolocation.

For Mastodon, we took 8 accounts each from 8 popular instances, and tried to scrape geolocation data.

We found that Mastodon does not collect any location related metadata, hence any inference of geolocation is not possible.

In contrast, Twitter has location metadata attached to many tweets, and even if a tweet does not have metadata itself, it's location could be inferred from a friend's location tagging.

User Survey

After doing the research, we prepared a survey in order to gauge user interest and needs, and we presented our research findings beforehand so that users could make informed decisions.

The following are the inferences we obtained:

Users have started exploring other social media platforms.

The way data is managed in decentralized platforms is very attractive to many people.

However, users are also led to believe that data controls are good based on how polished the tools to access the data are.

Location privacy is not an important factor for most people in the survey.

Conclusion

From our research we could obtain some very fundamental differences in the way Mastodon and Twitter work, which through the survey we found was helpful and interesting to the survey participants. Hence, we maintain that Mastodon and by extension federated services are an important avenue to explore in the field of social networks.

Future Work

Further research needs to be done in terms of tools through which one could extract data from Mastodon. We also need to look into how privacy is affected when single-user instance mode of Mastodon is used, effectively making the user an instance by themselves.

Team Members

Zeya Umayya
Aman Chaudhary
Mansi Bansal
Hitesh Arora
Meenakshi Das
Kolla Nikhil
Prakrati Gupta
Vandana
Koustuv Kanungo

Search This Blog

CSE648-PSOSM-2021