Privit(A PII analyzer for image content)

Privit

A PII analyser for image content



Jane, a social media influencer who loves uploading images online like the rest of us :) One day suddenly she starts receiving parcels and messages from unknown people. After a few days, her family started receiving the same messages. She was confused and at the same time felt threatened, she found herself in a hopeless and helpless situation because she didn’t know how the information got leaked



Later a friend visited her and he told her that she may have leaked her information through her social media images. He also said that almost 1 million people face the same problem across the globe. Also, this leak of information can cause serious damage to the person affected by this and so to overcome this he uses a software called Privit (a PII analyser for image content).



Introduction

With the increased use of social media amongst youngsters who now spend most of their day on Snapchat, Instagram, etc. and in this process they unknowingly share their sensitive data which could be used against them.

So, to analyse this problem we thought of software that could detect and warn users about any sensitive information they are sharing. This led to the thought of Privit which is a PII analyser from images. What Privit basically does is read data from your image and then classified it into different categories of PIIs like phone number, pin code, address, aadhar, etc.




Methodology

The methodology can be mainly broken into 3 parts:

  • Text Detection: The starting and the most important step for this project were to identify if the image contained text or not. We have used two methods for detecting text from the images. For the first technique, we resize the image and use canny edge detection to detect the boundaries. Later we use these boundaries to create contours. We further run an algorithm to detect the largest contour based on the area it covers. This largest contour is used to detect the focussed text. For the second technique, we resize the image and pass the image in the EAST model for text detection. The EAST model is a deep neural network that is very efficient at finding a text from incidental text images. Using both methods we reported the regions which may contain text from the images.

  • Text Recognition: The text from the images was mostly aligned but was sometimes skewed. We warped these images to align their perspective and receive a more readable image. We used pytesseract to help us recognize the text from these images. The text was then put in a list in the form of strings and passed on for further processing.

  • PII Detection: We have first created a vocabulary for defining what we considered as PII. We created regexes for these PIIs and used them for our strings. This way we obtained all the PIIs from the images.



Dataset

  • ICDAR2015 incidental scene text erection dataset was used for the deep learning model to detect text.

  • We collected 10000+ images to test our program.

  • For detection of different PIIs we ensured that our dataset contained images of different scenarios like text focussed images, location images, etc.


(The bigger image getting broken into smaller images)





Analysis
  • We observed a lot of people posting their FIR report on social media handles and tagging the officials. This act of them exposed a lot of their personal details which can easily be compromised.

  • People also had a common habit of posting images with vehicle number plates in the background.

  • Some people have the habit of sharing stories and sharing their location on a daily basis. By this, we could detect which they regularly visit and their neighbouring house/working place



Types of PII

% of time they appeared

PinCode/Address

0.35

Phone Number

0.437

Vehicle Numbers

0.2

Named Entities

0.13

Email-id

0.007


Results

Let us demonstrate an example as to how much information a PII can deliver.


From just the car number we were able to get the address, the owner name and the phone number.



References



Members:

  • Ajit Singh, 2018009
  • Anikait Sahota, 2018016
  • Amisha Upadhyay, 2018128
  • Bhunesh, 2018280
  • Ishaan Arora, 2018041
  • Praphull Dass, 2018071
  • Sahil Sharma, 2018088
  • Shikhar Tiwari, 2018100


Comments

Popular posts from this blog

Sperrow

#Tractor2Tractor

BotShot