PrivSense

 

PrivSense

Privacy Detector Overlay For OSM Platforms




During our course CSE648: Privacy and Security in Online Social Media at IIIT Delhi, our team developed a chrome extension named PrivSense, a privacy detector overlay for Online Social Media Platforms.



Introduction





Privacy on online social media platforms refers to the level of security and protection a user has while being connected to the internet. There have been numerous reports of personal data being leaked online. The main problem is that people are unaware that most of their private information is revealed by themselves inadvertently due to their personally identifiable information on online social media platforms like Whatsapp, Twitter, Linked In, Facebook, etc.

Even the simplest information like location or sexual orientation can have enormous privacy concerns:

Location / Phone Number
  • The users' location or phone number can be sold to different marketing agencies who can then use this data for manipulating the user's decision for their product - Target Marketing
  • An abuser may use this information to stalk or maintain power over the user, invading their privacy.
  • Mobile devices can be hacked via URLs, images sent via text messages on the phone.
  • Mobile numbers can be used by hackers to commit frauds by faking the user's identity. 

Name / Email ID
  • A user's name or email id can be used to track him on various social media platforms. This itself is a good thing as it helps people connect but will also expose the user to the possibility of amalgamation of personal and private life.
  • Moreover, people with bad intentions can create different fake profiles of the user on any OSM platform without his knowledge.

Aadhar Number
  • Aadhar number uniquely identifies a person as a citizen of India. This number itself contains the user's entire information starting from his name to his biometric information. 
  • If going in the wrong hands, An abuser might use this information to create a fake identity of the user and commit crimes in his name.

We all have witnessed fraud/spam calls that fake their identity as people we might know. People share this information openly as they are unaware of the different ways how they can be misused. All this information is crucial from a privacy point of view. If the users continue revealing this information without realising the consequences, they might end up in a pitfall.

There is no particular or targetted way to determine or identify the privacy concerns of such information if shared on OSM platforms. This calls for a real-time feedback system that points out the privacy status of the data being revealed online.



Problem Statement


There exists a lack of awareness among people about the sensitivity and consequences of information shared by them on a daily basis through Online Social Media platforms.



Problem Breakdown


Does the given post contain any private
or sensitive information of the user?

How to make people aware of the  consequences of
sharing their private information on OSM platforms.


How to share information publically 
without leaking private data online.



Proposed Solution


We propose to develop a system that makes the user aware of the type of private information he shares online through his posts or texts on Online Social Media Platforms. The system will analyse the granularity of disclosure control and detects the keywords in the text that reveal the user's private information. In detection, both direct and indirect mentions of private information will be considered in a robust manner.

The system will nudge the user whenever such information is triggered in the text by our models and the user will be notified about their privacy data by the following measures:



  • Nudges
    The system doesn't aim to manipulate the user's choices but to actually make him/her aware that he/she is making any such choices. Hence a nudging system suggests changes to the user which he/she can possibly make in order to limit private information being leaked.

  • Sensitivity Score
    Every private entity is assigned a normalised weight based on a survey we conducted inside IIIT Delhi. The score is being predicted as the weighted sum of the non-sensitive entities divided by the weighted sum of all words and then multiplying this to the weighted sum of private entities that are mentioned in the text.


    Here, receiving a low score indicates that more private information is being shared and vice-versa.

  • Consequences
    The system informs the user about the possible ways in which the personally identifiable information can be misused if they leaked online. This will make the user aware and help him/her decide whether he/she should share that particular information or not.



Models





User Study

We conducted a survey within IIIT Delhi to get their opinion on how hesitant they are in releasing various private information on online social media platforms to assign weights for our score system. The following are results that we obtained from this survey:
               
                                                                            
               

               

From the results we obtained above, the following inferences were made:
  • The majority of the population were willing to share information in context with Name(~88%) and Religion(~70%).
  • ~65% of the population were willing to share their Email ID information.
  • People consciously were not willing to share information about their phone, location and medical history.
  • However, as we will show and explain later, we found that users online were more likely to share such information, contrary to what they claimed during the survey.
Hence, we can conclude that users tend to inadvertently share information that could leak their privacy. Therefore, our initial hypothesis was validated.



Dataset

Dataset Selection


We analysed the following datasets:
On the careful evaluation of the three datasets, we finally selected the Reddit Mental Health dataset as it fulfilled most of the requirement for various personal information that we wish to incorporate into our system.

Dataset Description


The dataset consists of 4 columns:
  • Author - Consists of the username of the account for the post



    • A lot of users deleted their accounts and so there is the most number of posts under the deleted section.
    • However, a maximum number of posts are from the Username: Automoderator (23 posts).

  • Date - Consists of the date of generation of the post



    • The dataset consists of various posts under the given subreddits and the below graph represents their dates of creation ranging from Jan 2019 to April 2020.
    • It can be observed that no data between Mid April - November 2018 and November - January 2019 is present in the dataset.
    • Most posts are from the month of January.

  • Post - Consists of the content of the post



    • The word cloud represents the major words used in the given posts.
    • Most of the words used in the posts are general words that we use in our day to day life for conveying our feelings.


  • Subreddit - Consists the category of the post



    • The maximum number of Reddit posts are under the category - personal finance (more than 120k) and depression (more than 110k)
    • Most of the posts are related to personal finance and people mostly share depressed content in their posts.



Working


Flow Chart

PII Detection

The extension is triggered whenever the use writes a text on any OSM platforms. It doesn't matter if he/she is texting someone or posting about something, our system nudges the user with the amount of personal data he is leaking through his text.



In the above example on Twitter, the box in the bottom right corner displays the different PIIs that the user has mentioned in his tweet. While the user writes his tweet, the extension nudges him about the amount of private information he is giving through his post and displays it. A lower score implies that the text leaks more private information, whereas a higher score means less private information is leaked. 

In the example, it can be clearly seen that the mention of PIIs like name and nationality is well detected by our system and is conveyed to the user.  Moreover, In his tweet, the user clearly talks about his political preference and hence has leaked his political opinion on a public platform. The extension cleverly identifies this and informs the user about a political context being conveyed through his tweet. His tweet is assigned a score of 20, which implies more private information being leaked. Now, the name is not given much weightage for the calculation of privacy score, but nationality and political context being more weighted parameters in terms of privacy lead to a lower score of 20.



Nudges



The extension detects the PII and, at the same time, informs the users about the different ways in which that particular PII can be misused against them. On hovering over the information icon, the user can view some information about the misuse of that PII. For further insights, the extension also offers a webpage that briefly describes the various consequences of releasing different PIIs online.

The page consists of information regarding how different private pieces of information can be misused against the user and invade privacy.


Privacy Policy


We have also defined a privacy policy for our extension where the user can view all the information related to how our system uses their data and functions over it. This helps to create transparency between the users and the system and maintains the regulations set by OECD for data privacy.



The Team

Our team consists of students from different backgrounds. We are enthusiastic computer science students from different branches at IIIT Delhi and we have an inclined interest in the field of privacy and security in online social media. Below is a small introduction of our team members:



Comments

Popular posts from this blog

Sperrow

#Tractor2Tractor

BotShot