PrivSense
PrivSense
Introduction
- The users' location or phone number can be sold to different marketing agencies who can then use this data for manipulating the user's decision for their product - Target Marketing
- An abuser may use this information to stalk or maintain power over the user, invading their privacy.
- Mobile devices can be hacked via URLs, images sent via text messages on the phone.
- Mobile numbers can be used by hackers to commit frauds by faking the user's identity.
- A user's name or email id can be used to track him on various social media platforms. This itself is a good thing as it helps people connect but will also expose the user to the possibility of amalgamation of personal and private life.
- Moreover, people with bad intentions can create different fake profiles of the user on any OSM platform without his knowledge.
- Aadhar number uniquely identifies a person as a citizen of India. This number itself contains the user's entire information starting from his name to his biometric information.
- If going in the wrong hands, An abuser might use this information to create a fake identity of the user and commit crimes in his name.
There is no particular or targetted way to determine or identify the privacy concerns of such information if shared on OSM platforms. This calls for a real-time feedback system that points out the privacy status of the data being revealed online.
Problem Statement
There exists a lack of awareness among people about the sensitivity and consequences of information shared by them on a daily basis through Online Social Media platforms.
Problem Breakdown
Does the given post contain any private or sensitive information of the user? |
How to make people aware of the consequences of sharing their private information on OSM platforms. |
How to share information publically without leaking private data online. |
Proposed Solution
- NudgesThe system doesn't aim to manipulate the user's choices but to actually make him/her aware that he/she is making any such choices. Hence a nudging system suggests changes to the user which he/she can possibly make in order to limit private information being leaked.
- Sensitivity ScoreEvery private entity is assigned a normalised weight based on a survey we conducted inside IIIT Delhi. The score is being predicted as the weighted sum of the non-sensitive entities divided by the weighted sum of all words and then multiplying this to the weighted sum of private entities that are mentioned in the text.Here, receiving a low score indicates that more private information is being shared and vice-versa.
- ConsequencesThe system informs the user about the possible ways in which the personally identifiable information can be misused if they leaked online. This will make the user aware and help him/her decide whether he/she should share that particular information or not.
Models
User Study
- The majority of the population were willing to share information in context with Name(~88%) and Religion(~70%).
- ~65% of the population were willing to share their Email ID information.
- People consciously were not willing to share information about their phone, location and medical history.
- However, as we will show and explain later, we found that users online were more likely to share such information, contrary to what they claimed during the survey.
Dataset
Dataset Selection
- Reddit Mental Health - https://osf.io/7peyq/
- Reddit Data Huge - https://www.kaggle.com/prakharrathi25/reddit-data-huge
- Twitter - https://www.kaggle.com/ywang311/twitter-sentiment
Dataset Description
- Author - Consists of the username of the account for the post
- A lot of users deleted their accounts and so there is the most number of posts under the deleted section.
- However, a maximum number of posts are from the Username: Automoderator (23 posts).
- Date - Consists of the date of generation of the post
- The dataset consists of various posts under the given subreddits and the below graph represents their dates of creation ranging from Jan 2019 to April 2020.
- It can be observed that no data between Mid April - November 2018 and November - January 2019 is present in the dataset.
- Most posts are from the month of January.
- Post - Consists of the content of the post
- The word cloud represents the major words used in the given posts.
- Most of the words used in the posts are general words that we use in our day to day life for conveying our feelings.
- Subreddit - Consists the category of the post
- The maximum number of Reddit posts are under the category - personal finance (more than 120k) and depression (more than 110k)
- Most of the posts are related to personal finance and people mostly share depressed content in their posts.
Working
Flow Chart
PII Detection
In the above example on Twitter, the box in the bottom right corner displays the different PIIs that the user has mentioned in his tweet. While the user writes his tweet, the extension nudges him about the amount of private information he is giving through his post and displays it. A lower score implies that the text leaks more private information, whereas a higher score means less private information is leaked.
In the example, it can be clearly seen that the mention of PIIs like name and nationality is well detected by our system and is conveyed to the user. Moreover, In his tweet, the user clearly talks about his political preference and hence has leaked his political opinion on a public platform. The extension cleverly identifies this and informs the user about a political context being conveyed through his tweet. His tweet is assigned a score of 20, which implies more private information being leaked. Now, the name is not given much weightage for the calculation of privacy score, but nationality and political context being more weighted parameters in terms of privacy lead to a lower score of 20.
Nudges
The extension detects the PII and, at the same time, informs the users about the different ways in which that particular PII can be misused against them. On hovering over the information icon, the user can view some information about the misuse of that PII. For further insights, the extension also offers a webpage that briefly describes the various consequences of releasing different PIIs online.
The page consists of information regarding how different private pieces of information can be misused against the user and invade privacy.
Comments
Post a Comment