IJCAI 2020 Tutorial on Fact-Checking, Fake News, Propaganda, and Media Bias: Truth Seeking in the Post-Truth Era

Many recent important events, such as political elections or the coronavirus (COVID-19) outbreak, have been characterized by widespread diffusion of misinformation. How can AI help?

Description

The rise of social media has democratized content creation and has made it easy for everybody to share and spread information online. On the positive side, this has given rise to citizen journalism, thus enabling much faster dissemination of information compared to what was possible with newspapers, radio, and TV. On the negative side, stripping traditional media from their gate-keeping role has left the public unprotected against the spread of misinformation, which could now travel at breaking-news speed over the same democratic channel. This has lead to the proliferation of false information specifically created to affect individual people’s beliefs, and ultimately to influence major events such as political elections. There are indications that false information was weaponized at an unprecedented scale during Brexit and the 2016 U.S. presidential elections. “Fake news,” which can be defined as fabricated information that mimics news media content in form but not in organizational process or intent, became the Word of the Year for 2017, according to Collins Dictionary. Thus, limiting the spread of “fake news” and its impact has become a major focus for computer scientists, journalists, social media companies, and regulatory authorities. The tutorial will offer an overview of the broad and emerging research area of disinformation, with focus on the latest developments and research directions. It will cover recent work on a number of related problems such as misinformation, disinformation, “fake news”, rumor, and clickbait detection, fact-checking, stance, bias and propaganda detection, source reliability estimation, as well as detecting bots, trolls, and seminar users. We will also discuss recent advances in automatic generation of text, e.g., GPT-2 and GROVER, of images and of videos, e.g., “deep fakes”, and their implication for robojournalism and “fake news” generation.

Prior knowledge of NLP, machine learning, and deep learning will be needed to understand large parts of the contents of this tutorial. However, whenever possible, we will try to give high level description of the algorithms and concepts so that a larger audience would benefit from the tutorial.

Tutorial Structure

Here is a tentative, yet detailed, structure for the tutorial.

Introduction [20 mins]
1. What is “fake news”?
2. “Fake news” as a weapon of mass deception
  - impact of “fake news” in politics, finances, health
  - Does it really work?
  - Can we win the war on “fake news”?
Check-worthiness [15 mins]
1. Task definition
2. Datasets
3. Approaches
  1. ClaimBuster
  2. ClaimRank: modeling the context, multi-source learning, multi-linguality
  3. CLEF shared tasks
Fact-checking [40 mins]
1. Task definitions
2. Walk-through example: how humans verify a claim manually
3. Datasets: Snopes, “Liar, Liar Pants on Fire”, FEVER
4. Information sources: knowledge bases, Wikipedia, Web, social media
5. Tasks and approaches
  1. fact-checking against knowledge bases
  2. fact-checking against Wikipedia
  3. fact-checking claims using the Web
  4. fact-checking rumors in social media
  5. fact-checking multi-modal claims, e.g., about images
  6. fact-checking the answers in community question answering forums
6. Shared tasks at SemEval and FEVER
Fake News Detection [30 mins]
1. Task definitions and examples
2. Datasets: NELA-GT-2018, etc.
3. The language of fake news
4. Special case: clickbait
5. Tasks and approaches
  1. neural methods for fake news detection
  2. multi-linguality
Coffee Break [30 mins]
Stance Detection [20 mins]
1. Task definitions and examples
2. Datasets
3. Stance detection as a key element of fact-checking
4. Information sources: text, social context, user profile
5. Tasks and approaches
  1. neural methods for stance detection
  2. cross-language stance detection
6. Shared tasks at SemEval and the Fake News Challenge
Source Reliability and Media Bias Estimation [20 mins]
1. Task definitions and examples
2. Datasets: Media Bias Fact/Check, AllSides, OpenSources, etc.
3. Source reliability as a key element of fact-checking
4. Special case: hyper-partisanship
5. Information sources: article text, Wikipedia, social media
6. Tasks and approaches
  1. neural methods for source reliability estimation
  2. multi-modality
  3. multi-task learning
Propaganda Detection [40 mins]
1. Task definitions and examples
2. Propaganda techniques and examples
3. Datasets
4. Tasks and approaches
  1. neural methods for propaganda detection
  2. multi-linguality
Malicious user detection [10 mins]
1. Typology of malicious users
2. How can trolls be stopped?
3. Datasets
4. Tasks and approaches
  1. opinion manipulation trolls detection
  2. understanding the role of political trolls
  3. bot detection
Future Challenges [15 mins]
1. Deep fakes: images, voice, video, text
2. Text generation: GPT-2, GROVER
3. Defending against fake news
4. Applications: media regulatory compliance
5. New emergent threats

Tutorial Speakers

Preslav Nakov

Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar

Dr. Preslav Nakov is a Principal Scientist at the Qatar Computing Research Institute (QCRI), HBKU. His research interests include computational linguistics, “fake news” detection, fact-checking, machine translation, question answering, sentiment analysis, lexical semantics, Web as a corpus, and biomedical text processing. He received his PhD degree from the University of California at Berkeley (supported by a Fulbright grant), and he was a Research Fellow in the National University of Singapore, a honorary lecturer in the Sofia University, and research staff at the Bulgarian Academy of Sciences. At QCRI, he leads the Tanbih project, developed in collaboration with MIT, which aims to limit the effect of “fake news”, propaganda and media bias by making users aware of what they are reading. Dr. Nakov is President of ACL SIGLEX, Secretary of ACL SIGSLAV, and a member of the EACL advisory board. He is member of the editorial board of TACL, CS&L, NLE, AI Communications, and Frontiers in AI. He is also on the Editorial Board of the Language Science Press Book Series on Phraseology and Multiword Expressions. He co-authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and many research papers in top-tier conferences and journals. He also received the Young Researcher Award at RANLP’2011. Moreover, he was the first to receive the Bulgarian President’s John Atanasoff award, named after the inventor of the first automatic electronic digital computer. Dr. Nakov’s research on “fake news” was featured by over 100 news outlets, including Forbes, Boston Globe, Aljazeera, MIT Technology Review, Science Daily, Popular Science, Fast Company, The Register, WIRED, and Engadget, among others.

Giovanni Da San Martino

Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar

Giovanni Da San Martino is a Scientist at the Qatar Computing Research Institute, HBKU. His research interests include scalable machine learning algorithms for complex data structures with application to natural language processing tasks, including propaganda detection, paraphrase recognition, community question answering, stance detection. He has served on the program committee of NeurIPS, NAACL, IJCAI, EMNLP, COLING, IJCNN, AAAI; he has been an area chair for ACL 2019-2020. He was a co-organizer of the CLEF-2018, CLEF-2019, and CLEF-2020 Fact-checking labs, of the SemEval-2020 Task 11 on fine-grained propaganda detection in the news, of the 2019 Workshop on NLP4IF on censorship, disinformation, and propaganda, and of its shared task, and of the 2019 Hack the News Datathon, and of ECML/PKDD2016 Discovery Challenge Task.

Contact

If you have any question about the tutorial, feel free to send us an email.