DeFake.mp4
Motivation
Anytime anything remotely interesting happens, social media is flooded with information about it, so much so, that these days social media is a primary source of information and news for a large percentage of people. A problem arises then, because social media platforms are largely unfiltered. Fake news isn’t that easy to detect anymore. We got in touch with many fact-checking organizations to understand the problems faced by them during fact-checking. What we found out was that fact-checking involves a lot of manual, time-consuming labour. As the mediums of spreading information increases, the job of fact-checkers becomes harder. These days, a lot of misinformation spreads through videos and live broadcasts. However, it is very time-consuming for a fact-checker to scrub a video, identify claims made in the video and fact check them. This problem was reported by all the fact-checking organisations we talked to.
Problem Statement
Based on the different experiences we got from the fact-checking organisations (hereon referred to as users), we defined our problem statement as:
“The problem of fake news, as of now, cannot be solved without human intervention. Thus, we work to aid the process and work on rapidly reducing the time it takes for a person to get relevant claims from the video to verify. ”
Proof of Significance
While working on this problem statement, we encountered the exact problem we were trying to solve. Members of the team sat down, collected news videos from different problems and watched the entire video to extract relevant claims from them. We tried to look for videos that could possibly have elements of fake news in them. Currently, a lot of the fake news in India revolved around COVID-19 and politics. Once these claims were obtained, we used fact-checking tools to identify if the claims being made in the video were true or not. Due to the nature of the video format, to find relevant information, we had to sift through the entire video, as only then could we mark the claims of the video as fake/authentic news.
Importance Of Addressing Fake News
Fake news has been around for decades, but with the exponential growth of the internet and the ever increasing users on social media platforms, fake news has become a part of our daily lives. Due to the nature of the internet, where anyone can say anything, a lot of the information is unfiltered and unverified. Bad actors can exploit this phenomenon to their advantage, but it can have disastrous effects, too. Fake news can be broadly classified into four types:
Satire - Intentionally fake stories that are aren’t supposed to be taken seriously. There’s a lot of satire content on the internet made by individuals and organizations, usually with the intention of entertainment. However, sometimes satire, in the interest of gaining attention, can be exceedingly misleading.
Clickbait - Clickbait content is designed to be attention grabbing in the hopes of getting more users to click on their content.
Propaganda - The spread of personal interests and ideas through the use of incorrect or misleading information. Mutually opposing groups often use fake propaganda to blur the lines between right and wrong.
Mistakes - Unintentional fake stories, due to human error.
As more people get on the internet, there’s only going to be more fake news. Negative elements of the society are getting smarter about their mode of operation, and law enforcement and individuals themselves need to stay up to date. Not everyone on the internet has the adequate training or mental models in place to scrutinize the information presented to them. The rise in the number of people who’re anti-vaxxers, believe in conspiracy theories such as “the earth is flat” or “lizard people” points to how important this issue is.
Fake news can spread like wildfire. News stories designed to be sensational, easily shareable or promoting propaganda can quickly penetrate all classes of our society. Even once news stories have been fact checked and found to be false, only a small percentage of the people who saw the fake story ever learn about its inauthenticity. Thus, it becomes crucial to have rapid response to fake news.
Requirement Gathering
Interviews
We interviewed Pratik Sinha, founder of AltNews, an Indian non-profit fact checking website. Pratik explained the fact-checking process at AltNews and the pain points in the process itself. He explained how time-consuming it was to fact check a video. Given the volume of misinformation that is spread using videos, we found this a good problem area to work on.
Academic Literature
[1] analysed the presence of Covid-related video spreading misinformation on YouTube and the sharing activities of such videos on other social media platforms like Facebook, Twitter and Reddit. They show that YouTube takes weeks to remove such videos from its platform (41 days on average), during which they garner more shares and views than the top-5 English language news sources combined. They also found that most of these videos’ viewership is through Facebook shares, of which Facebook successfully flagged less than 1% with warning labels. Such a situation raises the need for better and more robust misinformation detecting systems for the video medium and tools and technologies that can help organisations fact-check such videos quickly and efficiently.
Audit experiments were conducted in [2] to understand the spread of misinformation through YouTube’s Search and Watch recommendations. Their findings reveal that the impact of demographic data on misinformation in the user’s recommendations is more significant for users with a developed watch history compared to new users. YouTube’s algorithm does try to counter a user viewing misinformative videos by suggesting other videos of “debunking” nature, but these suggestions are not available for every topic. Their study also highlights a “filter bubble effect” wherein watching misinformative video leads to more misinformative videos appearing in search results.
From the point of view of a person tackling misinformation, YouTube’s algorithm may be counterproductive by adding more misinformative videos in the search results of a person looking for a video of factual and debunking nature. This may force the user to manually watch each video’s entirety to extract the information being spread and classify it as true/false.
Proposed Solution
Data Gathering
All members of the team worked together towards developing the dataset. For our preliminary analysis we collected around 20-25 videos all on the topic of COVID. On this data we performed sentiment analysis and gathered other information using the metadata. Information such as likes, comments, number of subscribers, length of video were obtained.
For our final dataset, we collected videos from youtube that had captions available. This was done to validate if the problem actually persists, and if our solution would work for it. We verified that there was a lot of fake news available on YouTube even though Youtube actively works to remove fake news on it’s platform. Our solution also involves using the captions provided by the uploader or closed captions to obtain the claims.In addition to this, we used speech-to-text for videos that did not have captions available. From these captions, we were able to extract claims.
Claim Extraction
Since the captions downloaded from the videos are very long and going through each sentence would be a time-consuming task, we extract the main concepts from them. This has two advantages; firstly, claim extractors will know exactly what the video claims. Secondly, it makes it easier for the fact checking API to understand and show only relevant results. Claim extraction is done by sorting the sentences in the video to obtain the sentences with the most impact. Often, the captions are auto generated and hence lack distinct sentences. In this case, we use a pre trained model to add punctuations so that we have a set of meaningful sentences [3].
Google FactCheck API
Google FactCheck is a tool developed by Google to combat misinformation. It allows users to search any claim,sentence or word and get the most relevant fact checks carried out on the same. Our tool uses this API in conjunction with the claims extracted from the video itself. We feed each of the claims to the API and store the fact check title, text and rating given by the fact check agency from the top results of the API. This stored data is then used in verifying the claims we extract from the video.
Verifying the claims extracted
To check the correctness of the claims extracted by us against the results presented, we added a feature to our pipeline that verifies whether they are relevant to the Google FactCheck results. The claim and Google FactCheck results are vectorised. We use a similarity measure (cosine distance) between the extracted claim and the Google FactCheck result vectors, and the result with the maximum similarity is presented as the top result. This ensures that the google results closest to the actual claim made in the video show up first, and eliminates some bit of the ambiguity of the claim and the erroneous results produced because of it.
Future Work
While we explored this problem statement and developed our solution, we also identified another key problem in the same sphere we were working in. A good chunk of fake news is spread in regional languages, through videos made by independent users. We have only focused on videos in English in the scope of our project, but an extension to this, and an important one, could definitely be adding a translation piece for more languages.
References
Aleksi Knuutila, Aliaksandr Herasimenka, Hubert Au, Jonathan Bright & Philip N. Howard. “COVID-related misinformation on YouTube: The spread of misinformation videos on social media and the effectiveness of platform policies.” Data Memo 2020.6. Oxford, UK: Project on Computational Propaganda. comprop.oii.ox.ac.uk
E. Hussein, P. Juneja, and T. Mitra, “Measuring Misinformation in Video Search Platforms: An Audit Study on YouTube,” Proceedings of the ACM on Human-Computer Interaction, vol. 4, no. CSCW1, pp. 1–27, 2020.
Tilk, O., & Alumäe, T. (2016). Bidirectional recurrent neural network with attention mechanism for punctuation restoration. Interspeech 2016. doi:10.21437/interspeech.2016-1517
Comments
Post a Comment