You_v2.0: Do your tweets reveal your personality?

Overview



Problem Statement:

If you’ve ever seen the Social Dilemma, you’d know that big-tech companies have and do use your data collected on social media against you. That unfortunately includes your personality. If you haven’t seen it, boy, are you in for a shock. 


I’m sure you’re aware that we give up a huge amount of personal information and data about ourselves every time we open the internet. From seemingly harmless BuzzFeed quizzes to increasingly personal security questions (“First home or pet or teacher"), companies like Google and Facebook have all of it. And they don’t just store it on their fancy servers somewhere. They use it against us to generate more revenue. 


For example, did you know that, according to the Myers-Briggs Type Indicator, SJs (sensing-judging types) are inclined to resist change, and are inclined to make decisions too quickly? Thus people believing conspiracy theorists may be likely to be SJ types. 




Why Should You Care?

Okay, so they have it. And they sell it to advertisers to make our recommendations or feeds better and more unique to us. How is that a bad thing? Isn’t it making life easier for me to have to think less about the choices I make or for Google to know what I’m searching for before I type it out fully? Well, mostly. I can’t argue with that. But it’s the behind-the-scenes stuff that we don’t see or realize that really matters. Like, can they use the fact that your friends excluded you from an event that they posted about to make you angrier by showing it to you again and again? Yes. Can they use your mental health issues to do or say or post something hateful that you may regret later but will never be able to take down? Yes. 


Background

Key Concepts

In order to understand the repercussions of this data collection better, let’s take a quick look at some of the key concepts surrounding this issue.


Privacy

This is a fundamental right of a human being to be entitled to a certain degree of privacy. In terms of data, it refers to the control that an individual has over the data they share online. Unfortunately, in this age of Big Tech, our control over it has diminished greatly.



Targeted Advertising

With all the information Big Tech is collecting about us, it adopts a Big Brother-like persona. Anytime we use the internet, we leave behind digital traces which are used by Big Tech to keep track of all our traits, preferences, personality types and so much more. This can then be used by companies (I’m talking to you, Google!) to send us targeted advertisements which use our traits and preferences to manipulate us into doing or purchasing certain things which we may not want to do in the first place!



Surveillance Capitalism

Coined by Shoshana Zuboff, this refers to the dystopian reality of how our personal data is commodified by Big Data companies. Big Tech companies collect thousands of data points about users without any meaningful mechanism of consent. This data is then sold and used to control and manipulate users in certain directions. According to her, this is the new logic of capitalism that governs society. Ask yourselves this, do we really want to allow ourselves to be constantly manipulated by corporate advertising?



Echo Chambers

An echo chamber is an environment where a person only encounters information or opinions that reflect and reinforce their own. Why should we care? Because consuming misinformation within the narrow confines of our online worlds inhibits our ability to know what’s true, to make choices based on the most accurate information, to make informed decisions about what to believe, and to resist the many invisible forces that might not have our best interests in mind. 

 



What Did We Do? 

Okay, yikes, Sound terrifying enough yet? How can we fix this? Can we even fix this? How do we know what to do about it if we don’t even know what kind of profile they’ve made of us? We created a website to emulate what big tech companies do - we take a Twitter handle, find its personally identifiable information and public tweets, and infer the user’s online personality through them. Then we give personalized actionable advice based on the personality type and strengths and weaknesses.


The Tech Stuff

Technologies Used - ML, NLP, ..

We chose the dataset Personality Traits on Twitter which contains around 7.8 tweets of users who have self-identified their personality type on Twitter. We chose this dataset because of its size and because it gives the MBTI personality type of users on the basis of their Twitter posts. Other datasets that we came across were small in size as they were manually annotated or were over different forums like personality cafe, which also proved less generalisable over Twitter data of the users. One important point to note is that since the 16 personality types are not uniformly distributed in the world, our dataset too is non-uniform. More precisely, the distribution of the number of users of different personality types is depicted in the image below.

For Predicting Personality, 4 models were used for each personality aspect. Naive Bayes classifier was used. Each model was trained on data of 15,420 profiles & 7,71,000 tweets. Initially, one model was used to classify a user into 16 personality types based on their tweets.  Later, we created 4 Naive Bayes models for each of the 4 complementary personalities. This significantly improved our results.

For Suggesting Personalised Actionable Advice

Once the personality type was found, the strengths and weaknesses of each type were analyzed to find potential manipulability using existing research as well as a separate manual analysis done by the group researchers. 


For example, ISFJ:  

 "Because of your altruistic and supportive nature, you may be manipulated by emotionally triggering scams. You may have seen some about people in dire straits requiring money for operations, etc. Please be careful to verify the sources of such requests online."

       "Since you are reluctant to change, your current familiar tastes and opinions can be used against you to incite anger and hatred. Please make sure that you support causes with a logical foundation and not on blind faith."


How Did We Train The Dataset

Initially, one model was used to classify a user into 16 personality types based on their tweets. This pipeline gave poor results since the number of output features were very large as compared to the size of input data. Therefore, we created 4 separate models for each of the 4 complementary personalities. We then combine the results from these 4 distinct models to generate the personality type of the user. This significantly improved the prediction power of the machine learning pipeline. We passed the sentences through standard NLP preprocessing techniques like removal of stop words, punctuations and used bags of words to prepare feature vectors. For machine learning models, we tried a number of different permutations of classifiers and hyperparameters. We found that a Naive Bias classifier gave the best results for training and test classification.


Our Solution



Explanation on how to use the Website

The website allows the user to enter their public Twitter username.

Then on the backend, upon authentication, the Twitter API is used to perform social media data scraping on the user’s profile. Public data from the account such as the user’s bio, tweets, retweets, likes, etc. are collected and analyzed by our models to form a broad online shadow profile. 

The next frame displays details of the account such as the user’s profile picture, and bio along with the results of the personality analysis.

The Myers–Briggs Type Indicator (MBTI) personality type of the user as inferred from the public data is presented. This can be one of the following 16 personality types:

  • ISTJ - The Inspector

  • ISTP - The Crafter

  • ISFJ - The Protector

  • ISFP - The Artist

  • INFJ - The Advocate

  • INFP - The Mediator

  • INTJ - The Architect

  • INTP - The Thinker

  • ESTP - The Persuader

  • ESTJ - The Director

  • ESFP - The Performer

  • ESFJ - The Caregiver

  • ENFP - The Champion

  • ENFJ - The Giver

  • ENTP - The Debater

  • ENTJ - The Commander


Each type is presented as a code comprising of the following 4 scales:

  • Extraversion (E) – Introversion (I)

  • Sensing (S) – Intuition (N)

  • Thinking (T) – Feeling (F)

  • Judging (J) – Perceiving (P)


Further, the result also depicts where the user lies on the spectrum of the 4 scales.

The website also presents a common set of recommendations for users to keep in mind while using online social media (OSM) platforms such as Twitter.


Finally, the main section of the website lists user-specific actionable advice. 

This includes a detailed section regarding the strengths and weaknesses of the perceived personality type. Additionally, it presents research-backed information about the potential ways in which the user can be targeted and manipulated based on their MBTI personality type. It also displays user-specific recommendations of digital practices which can be incorporated to safeguard against potential manipulation.


Benefits of our proposed solution

Our solution is designed in the form of a website. The MBTI personality type prediction based on the user’s Twitter account alerts the user to the extent their seemingly innocuous personal data such as mere tweets can be used by big tech companies to create an accurate profile of them. 

Via real-time demonstration, our solution highlights how psychological profiling can be used to identify the strengths and weaknesses of a user. Further, it portrays the tip of the iceberg of how this information can potentially be weaponized for targeted manipulation.

This spreads awareness about how even rudimentary psychological profiling makes users susceptible to manipulation by discovering and exploiting their weaknesses. Big tech companies have more detailed and accurate data on the users that can be used to predict behaviour, and manipulate their choices. For example, Cambridge Analytica used deceptive means to gain access from Facebook to “granular” information about more than 50 million Americans and deployed it to tailor political messaging for Trump’s presidential campaign, which he eventually won. Information was used as fodder for a political campaign, without the user’s knowledge or consent. The movie, The Great Hack showcased how “persuadable” users were identified using psychological profiling and were targeted with the political campaign. 

Finally, the salient facet of our solution is to provide user-specific actionable advice. 

Our solution presents research-backed information about the potential ways of manipulation based on MBTI personality type and digital practices to safeguard against them. 


Thus, our solution will arm users with greater general awareness and the user-specific actionable advice will enable them to safeguard against potential manipulation.


User Evaluation

Our goals for the project were:

1. To let people know that their online behaviour and activity can reveal their personality type, and organisations such as the big tech companies can manipulate the users and target them by personality traits.

2. To provide users with actionable advice tailored for their personality characteristics and what they are most prone to fall for. 


The user evaluation methodology followed was:

1. Comparison of MBTI Test Result with our Model Prediction

We recruited 15 participants that were active users. We asked them to take the MBTI Personality Test and then asked them to use our website.

We also selected 15 celebrities that were active on Twitter and calculated their results from our website and compared it with their personality types available online. 


2. Website Evaluation

Here we again used the same set of 15 participants. We conducted a survey where they had to evaluate the usability of the website by using the SUS (System Usability Scale) Evaluation Survey. We also asked the participants to evaluate our personalised actionable advice and provide some feedback for the website. 


User Evaluation Results




60% of users had at least 3 attributes correctly predicted
26% of users had all 4 attributes correctly predicted
46% of celebrities had at least 3 attributes correctly predicted
13% of celebrities had all 4 attributes correctly predicted


Usability Feedback


User Feedback

“I am thinking of changing my profile settings so that only people that I explicitly allow can see the content I post on Twitter” - Nivedit

“Was amazed how accurate it was” - Arunesh

“The website was very easy to use and informative. I had a great experience.” - Kartik

“I was stunned by how much my tweets reveal my personality type.” - Febron

“This website helped me become aware of what I tweet... really great personalised advice too!” - Madhav

“I am thinking of not tweeting out all the ideas I have in my head so as to not give away who I am as a person to anyone on the internet” - Pratham


Meet the Team









Comments

Popular posts from this blog

#Tractor2Tractor

Sperrow

🍞 bRead