Suno Sabki Karo Apni (SSKA)

The name of this web extension is the abbreviation for an age-old Hindi adage, Suno Sabki Karo Apni, or Listen to everyone but do your own. 

This is a core fundamental principle of democracy: for people to have knowledge about all their choices and the right to complete control over their final decision. But it is said that Social Media is warping democracy by keeping users in a bubble and preventing them from seeing the holistic picture. SSKA is a way to reinforce democratic principles in the social media realm. 


PSOSM and Team 5 

A course called Privacy and Security in Online Social Media (PSOSM) was offered in Winter Semester 2021 in IIITD. People took this course either to learn to use social media APIs like Twitter API or to add a project to their resumes. PSOSM project constituted 45% of the total grade. The professor suggested project teams having 8-10 members each. Usually, teams consisted of 4-5 members and that could be managed well within friend circles. This was challenging, in fact, this felt more like battalion recruitment. 

The success of a project depends entirely on how good your team is so making a team was a very important task. After 3-4 days spent calling, texting, and emailing the entire class, 24 battalions teams were formed. 

We formed Team 5. It consisted of 9 members in total. 3 CSE students, 5 CSD students and 1 CSB student. We were proud of the varied skill sets our team members possessed. Some gave human-like intelligence to computers as Machine Learning engineers, while others cast magic with their front-end skills. Some had exceptional writing skills while others had a zest for video editing. We were proud of the battalion we had formed, it had already started feeling like a family.

From top to bottom, left to right: Yashdeep Prasad, Ira Aggarwal, Shaney Waris, Shreeya Garg, Om Khandade, Harman Singh, Abhinav Sharma, Kunal Anand, Sonali Supriya.  

Motivation & Problem Statement 

After the team-forming shenanigans were over, classes started. In every class, we were assigned 1 Netflix movie related to the class topics (watching movies was our homework, this was as cool as one can get in IIITD). One of the movies we saw was Social Dilemma. It talked about how social media tracks every moment and interaction of its users and traps them in an echo-bubble. This movie was so impactful that people actually revised their social media usage and digital footprints. Most of us first heard of echo bubbles from this movie itself and this was the idea-seed that finally led to the birth of SSKA extension. 

We figured that despite being tech-savvy engineering students ourselves, it was hard to stay vigilant about these echo bubbles. So for the average lay-man user, it was a critically concerning issue. Then we observed our parents, they primarily used 2 social media platforms for information sharing: Whatsapp and Youtube. For the scope of this project, we targeted Youtube.  Initially, we made a video to showcase the intensity of the problem, and our proposed solution. 


Final Problem Statement

The YouTube video recommendation algorithm keeps users in a loop of similar content and fails to provide wholesome coverage. It traps users in an echo chamber/misinformation bubble. In controversial topics, loops of biased information polarize the user's opinion without their knowledge.

 

Proposed Solution  

SSKA is a web extension that can be used on top of Youtube. SSKA will be activated whenever users watch content related to sensitive topics. It will alert users when they are consuming content related to sensitive topics and give 10 recommendations that provide wholesome coverage. 

All recommendations will be totally random, i.e. SSKA will not track user interactions at all so users are truly free when using SSKA. Each recommendation will be color-coded i.e. videos supporting a particular point of view will be represented by the same color so that people know beforehand what they can expect from the recommended video. 

Finally, SSKA will show statistics of the recommendations and legend for color codes. 

In conclusion, we can say that the SSKA extension reinforces the Right to Choice for users (by not tracking user interactions or storing any user data so the recommendations are 100% unbiased). SSKA provides full knowledge to users about their choices (with help of color codes, statistics, and legends) and finally, SSKA re-enforces user’s freedom to make their final decision of content they wish to consume. 


Initial Journey 

We were building a prototype within the duration of our course so we narrowed down to a single sensitive topic - Article 370. To manifest our vision for the SSKA extension, our team started collecting data on article 370. Each of us searched 8 videos (total 8*9 = 72 video dataset). Then we created a complete NLP-ML pipeline for this dataset. 

We extracted titles, tags, and descriptions from each of these videos and created feature vectors using advanced tools such as google universal transformer. Then we applied ML models on these vectors. Unfortunately, the results weren’t up to the mark. In one of the regular project evaluations, our professor remarked that we must extend our dataset to see proper results. 


This was a very valuable feedback. First of all, we extended our domain from just 1 topic to 2. Now we also included USA Elections 2020 along with Article 370 in our prototype. We used the Youtube API to collect all videos on these 2 topics.  We expanded our dataset from 72 videos to a total of 1100+ videos (Nearly 530 videos from Article 370 and nearly 600 from USA Elections). To make our dataset even more robust, we extracted the top 5 comments from each video to predict whether it was a positive or negative, or neutral opinion video. Our previous NLP-ML pipeline created beautiful results on this dataset. Additionally, we also performed some data cleaning manually for our unsupervised model to make better clusters. We also hyper-tuned the parameters of the model to give accurate results. We were back on track finally! 


Cluster on 72 datapoints
Clusters on 1100+ Datapoints


User Interface

A tech product is only as good as its UI. With 5 team members formally studying design, we targeted on making the user experience as intuitive as possible. Our Alerts show Green Color for non-controversial content and Red Color for controversial content. But for our video colour codes, we initially used red and green colours in UI to show a different point of views but then realised that those colours can be interpreted as we are “biasing” the users towards “green” coloured coded videos. We wanted to avoid this and so we chose unbiased colours like purple, yellow and green where green always indicates neutral video as those videos provided the holistic picture of the entire topic. Next, if a topic has more than two views then we can display it with more colours without getting stuck. Also purple and yellow are complementary colours so we chose them for different points of views. We also created numerous screens on Figma to properly visualise the UI and get the professor’s and user’s feedbackfeedbacks.


Last Few Days

Now we were ready with the backend code and the UI designs. But our struggles were far from over. 

The task at hand was the implementation of the extension in the web browsers and the integration of the extension with the machine learning-based backend. To take on this task, the team was initially divided into two subsets, i.e. one for the user interface and the other for the backend integration. None of us were that comfortable with javascript so converting the entire backend from python to javascript seemed daunting. 

Initially, we tried shortcuts like BrPython but when that failed miserably, we went back to javascript only for the backend. But we took each challenge in our stride, and after 2 all-nighters, the team successfully employed native HTML, CSS for UI and vanilla Javascript for the development of the backend of the extension.



This is the final product that the team arrived at. The extension informs the user whether they are consuming a video based on some controversial content or not. Next, it informs the users as to the opinion/side of the story of the above-said topic that they are consuming. The extension subsequently then suggests an array of related videos labeled (colour coded labels for easy visual cues) according to the point of view and presents the statistical information of the recommendations enabling the user to make an informed choice.  These informed choices enable our users to break through the clutches of these virtual echo chambers. 


End-User Review & Analysis 

The next step was analyzing the success of our extension in reducing in bringing out the users from these echo chambers. A logical way to analyze this would be to measure the amount of polarization that users were exposed to before the usage of extension and then comparing it with the same after the usage of the extension. 


To measure the polarization experienced by the users an evaluation metric called Coefficient of Polarization was defined.


Coefficient of Polarization:

Let there be for a video the Pov1, Pov2, and a Neutral Pov. Let the number of videos for each point of view (Pov) be p1, p2, and n respectively.


Coefficient of Polarization = |p1 - p2|/(p1 + p2 + n)


More the coefficient of polarization more is the amount of polarization a user is exposed to.  The maximum value of the coefficient of polarization can be 1.0 or 100% representing that the user is completely polarized towards one particular point of view of the topic in consideration. The minimum value of the coefficient of polarization is 0.0 or 0% representing the fact that the user is completely neutral in his approach towards the topic in consideration.

An experimental setup was created wherein each of the team members was asked to get 3 volunteers each preferably two elder volunteers and one volunteer of a similar age group. This was done to ensure that the extension is tested across all age groups. The experiment was set up with each volunteer given a fixed span of time to consume as many videos as they want on a particularly controversial topic for a fixed time span without the use of the extension. The number of videos watched by the subjects for each point of view for the controversial topic was recorded to generate their coefficient of polarization. The same task was repeated with the same users but this time was given the SSKA extension along with.

Fig. Snapshot of Data Collected


        Fig. Coefficient of Polarization of users before and after using the SSKA Extension


The graphs clearly depict the success of the SSKA extension wherein one can observe that the users used to have a high coefficient of polarization without the use of the extension. After the usage of the SSKA extension, the coefficient of polarization of users has come down significantly.


The graph of percentage reduction also supports the fact that SSKA helps in reducing the polarization experienced by users by a large extent and thus is freeing them from the echo chambers.


Github Link - https://github.com/ShaneyWaris/SSKA_Web_Extension


Feedback from the professor

PK profile - https://iiitd.ac.in/pk

On the day of our final presentation, while we waited for our turn with bated breath. Finally, we entered the room and presented our SSKA extension. PK sir said it was a “pretty cool” extension and that we should make our dataset public. He also said we should definitely scale it and convert it to an industrial-grade product. We thanked him for his teaching & valuable feedback and lastly, we felt rewarded after this evaluation. Cheers :D


Future Work 

The team plans to extend the Extension to a larger array of controversial topics, so as users can be freed from most of the echo chambers and gradually convert all the content consumers into informed users with clarity about the ingredients of the content that is being consumed by them. Another aspect of the future expansion is the introduction of watch time analysis. This will further enable the users to analyze the trends of their consumption.


Comments

Popular posts from this blog

Sperrow

#Tractor2Tractor

BotShot