Privosa by A!Z

Privosa by A!Z

"A child born today will grow up with no conception of privacy at all. They’ll never know what it means to have a private moment to themselves, an unrecorded unanalyzed thought. And that’s a problem because privacy matters. Privacy is what allows us to determine who we are and who we want to be."

– Edward Snowden

Introduction

Privosa is an interactive digital application designed to make its users aware of the privacy concerns about how their online data can be (mis)used by third party actors who profile them and create their personas leading to harmful outcomes. These personas, whilst can be helpful to recommend products/services or curate users’ experiences based on their public information, can also cause significant harm to the user by profiling them negatively without them even realising it. Hence, our effort Privosa highlights the bias within algorithms to bring awareness to the crucial nature of data agency in controlling their experiences.

We particularly focus on the steps an algorithm takes to glean insights from raw user data such as content they post on their social media, and insert human judgement- negative interpretation of the results into it, resulting in overall biased predictions that can negatively impact user’s job prospects, advertisements they receive and so on. 

We do this by providing a small example of user-profiling - simulating an actual experience where a user’s public twitter data is analysed through our judgemental algorithmic framework details of which are explained below. Our algorithm filters insights gained from a user’s public profile (such as them being an animal lover or a sports fan) and presents a negative interpretation of this information to them - giving them a taste of how extremely algorithms can judge them in efforts to predict their behaviour based on the data they are fed. We then provide interaction with these identified characteristics for them to see how the judgement changes when the algorithm has access to different characteristics about them. Many corporations use data like user-clicks, visits, interaction with other services, past experiences, etc. Privosa, on the other hand, currently uses only public tweets of the users to pass judgemental feedback to them.

Our target demographic is young Indian teenagers from the Urban Sphere. We chose this demographic for two reasons- 1. Research shows troubling trends regarding data right awareness in teens in middle to low income countries as, for instance, demonstrated by the fact that many teenagers are more comfortable with sharing their data under the mirage of personal privacy protection and control, unaware of the extent to which their data can be used and in what ways, and are also much less sensitive to it being used by third parties, 2. Focussing on this group is part of the bigger effort to ensure that they mature into future citizens who are aware of their data rights and the importance of nuance while understanding such issues. 


Process + Methodology



The first step begins at obtaining the user's Tweets. For that, we use Tweepy to collect all of the users' Tweets and store them in a local database. Fret not; the database is deleted as soon as the work related to it is done.



As a part of our project, we also conducted a small survey to determine standard identifiers and adjectives they prefer. Along with these labels, we curated some singular sentences to make the application more interactive, such that the user is talking to the interface. These curated labels and sentences in synchronisation provide opinionated remarks about the user in our application.


For our sentiment model, we use Empath, which combines modern NLP techniques and Linguistic Inquiry and Word Count methods to assign categories and labels on text. Empath is trained on three different datasets and hence has three different models: fiction, NYTimes, and Reddit. Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Empath has a set of over 200 built-in pre-validated categories generated from their datasets. One could create a new label by just providing a small group of seed words that characterize it.




The Empath model is initialized first, and our custom labels are added to it. To improve the model's understandability, we also provide it with some seed words that could help the model infer its meaning. The custom model is then used on the set of Tweets fetched from the user's profile and starts assigning scores to the collection of labels.


After the labels are scored, Privosa extracts the top 20 labels from the set. They are then shown to the users along with their corresponding sentences to make them more interactive. The labels are shown in groups of five so that the user could select/deselect the labels that they don't like. This increases user anticipation and improves the overall user experience of the application.



The application also hosts information in a blog-type format to keep users updated about their data rights and how others can use their public information malignantly. We strive to update the blog every once in two weeks and add more content to it.





Results + analysis


Once the Privosa interface was ready, we deployed it and shared it with our target audience. We got qualitative as well as quantitative feedback for the same. The users had expressed that the experience was very intriguing and exposed them to privacy on social media like never before. All of the users agreed that they had learnt something new (a), explanations ranging from the way they were perceived to how scary it was to have so much information out there. There were mixed opinions on the interface (b), although most users found it funny and engaging, some found the comments judgemental. They said they would share it with their friends (c) as it was a very fun, short and informative activity. Some users explored the about and learn sections of the app and found them helpful as well.

(a)

(b)

(c)



Other Resources


We also present a Self-help web page with content that users can explore to gain more information about data rights and how it affects them. While a central theme within data rights is individual privacy- which is the easiest to understand and of main purport to our project, we also believe that it is paramount to educate people about how algorithms leverage power over all of us by aggregating data. So, in many instances, an algorithm profiles a person based on data points from people “similar” to them based on demographics such as race, socio-economic status, language, culture. Such algorithms fall under the domain of Item based Collaborative Filtering. While powerful, none of these systems are void of biases and shortcomings- and it is important for users to be made aware of this as these systems ultimately impact them - as the user is the product in today’s data economy. We have also added an instagram page (link : https://www.instagram.com/privosa_org/ ) that follows the theme of our project and seeks to connect with cool interactive content with our target demographic. 


References


https://webfoundation.org/research/teenagers-on-social-media-understanding-managing-privacy/


Team

Our team had a lot of fun collaborating on this project. We were all equally involved and contributed majorly to the project. This project would not be possible without Professor Ponnurangam Kumaraguru's invaluable lectures, feedback and kindness in these trying times, as well as the effort and consideration of TAs.




Bhavika Rana - Left Top
Ujjwal Sehrawat - Left Bottom
Pranaay Saini - Center Top 
Vidit Jain - Center Center
Aniket Pradhan - Center Bottom 
Muskaan Aggarwal  - Right Top 
Jaspreet Singh Marwah - Right Bottom 




This project was carried out as part of the course Privacy and Security on Online Social Media, under the guidance of Professor Ponnurangam Kumaraguru, at Indraprastha Institute of Information Technology, Delhi in March-April 2021.  



Comments

Popular posts from this blog

Sperrow

#Tractor2Tractor

BotShot