Propaganda Analysis Project

The SemEval 2020 task 11 is over, but we have opened a permanent leaderboard for those who want to get the data, keep making submissions on the test set and check the state of the art on the tasks.

Task Description

Propagandistic news articles use specific techniques to convey their message, such as whataboutism, red Herring, and name calling, among many others. Help us developing automatic tools to detect such techniques!

Background

We refer to propaganda whenever information is purposefully shaped to foster a predetermined agenda. Propaganda uses psychological and rhetorical techniques to reach its purpose. Such techniques include the use of logical fallacies and appealing to the emotions of the audience. Logical fallacies are usually hard to spot since the argumentation, at first sight, might seem correct and objective. However, a careful analysis shows that the conclusion cannot be drawn from the premise without the misuse of logical rules. Another set of techniques makes use of emotional language to induce the audience to agree with the speaker only on the basis of the emotional bond that is being created, provoking the suspension of any rational analysis of the argumentation. All of these techniques are intended to go unnoticed to achieve maximum effect.

Technical Description

The overall goal of the shared task is to produce models capable of spotting text fragments in which propaganda techniques are used in a news article.

We have compiled a corpus of about 550 news articles in which fragments containing one out of 18 propaganda techniques have been annotated. We split the overall task into two subtasks:

Given a plain-text document, identify those specific fragments which contain at least one propaganda technique. This is a binary sequence tagging task.
We refer to it as SI (Span Identification).
Given a text fragment identified as propaganda and its document context, identify the applied propaganda technique in the fragment. Since there are overlapping spans, formally this is a a multilabel multiclass classification problem. However, whenever a span is associated with multiple techniques, the input file will have multiple copies of such fragments, so the problem can be algorithmically treated as a multiclass classification problem. Although the data has been annotated with 18 techniques, given the relatively low frequency of some of them, we decided to merge similar underrepresented techniques into one superclass:
- Bandwagon and Reductio ad Hitlerum into "Bandwagon,Reductio ad Hitlerum"
- Straw Men, Red Herring and Whataboutism into "Whataboutism,Straw_Men,Red_Herring"
and to eliminate "Obfuscation,Intentional Vagueness,Confusion".
Therefore this is a 14-classes classification task, which we refer to as TC (Technique Classification).

The competition is divided in 3 phases:

Phase 1. You compete with the other participants to get the best performance on the development set. A live leaderboard will keep track of all the submissions.
Phase 2. A test set will be released and you will have few days to submit your final predictions. Only the latest submission will be evaluated and considered to decide the overall winner. In this phase, no immediate feedback on the submission is provided. The winner of the competition will be determined upon the performance on the test set.
Phase 3. All participants will submit a paper describing their system and review other partcipants' submissions. Accepted papers will be presented at the International Workshop on Semantic Evaluation - SemEval-2020 (co-located with COLING 2020).

Data Description

Input Articles

The input for both tasks will be news articles in plain text format. In the first phase, participants will be provided with two folders, train-articles and dev-articles (in the second phase we will release a third folder for the test set). Each article appears in one .txt file. The title is on the first row, followed by an empty row. The content of the article starts from the third row, one sentence per line. Each article has been retrieved with the newspaper3k library and sentence splitting has been performed automatically with NLTK sentence splitter.

Here is an example article (we assume the article id is 123456):

⁰Manchin says Democrats acted like ³⁴babies⁴⁰ at the SOTU (video) Personal Liberty Poll Exercise your right to vote.

Democrat West Virginia Sen. Joe Manchin says his colleagues’ refusal to stand or applaud during President Donald Trump’s State of the Union speech was disrespectful and a signal that ²⁹⁹the party is more concerned with obstruction than it is with progress³⁶⁸.

In a glaring sign of just how ⁴⁰⁰stupid and petty⁴¹⁶ things have become in Washington these days, Manchin was invited on Fox News Tuesday morning to discuss how he was one of the only Democrats in the chamber for the State of the Union speech ⁶⁰⁷not looking as though Trump ⁶³⁵killed his grandma⁶⁵³.

When others in his party declined to applaud even for the most uncontroversial of the president’s remarks, Manchin did.

He even stood for the president when Trump entered the room, a customary show of respect for the office in which his colleagues declined to participate.

file: article123456.txt

Notice that superscripts are not present in the original article file, we have added them here in order to be able to reference text spans. The text is noisy, which makes the task trickier: for example in row 1 "Personal Liberty Poll Exercise your right to vote." is clearly not part of the title.

There are several propaganda techniques that were used in the article above:

The fragment “babies” on the first line (characters 34 to 40) is an instance of both Name_Calling and Labeling
On the third line the fragment “the party is more concerned with obstruction than it is with progress” is an instance of Black_and_White_Fallacy
The fourth line has multiple propagandistic fragments
- “stupid and petty” is an instance of Loaded_Language;
- “not looking as though Trump killed his grandma” is an instance of Exaggeration and Minimisation
- “killed his grandma” is an instance of Loaded_Language

Gold Labels and Submission Format

Task 1 - Span Identification

The format of a tab-separated line of the gold label and the submission files for task SI is:

 id     begin_offset     end_offset

where id is the identifier of the article, begin_offset is the character where the covered span begins (included) and end_offset is the character where the covered span ends (not included). Therefore, a span ranges from begin_offset to end_offset-1. The first character of an article has index 0. The number of lines in the file corresponds to the number of fragments spotted. Notice that if two techniques overlap, for example "not looking as though Trump killed his grandma" (characters 607-653) and "killed his grandma" (characters 635-653) , they are merged into one fragment (characters 607-653). This is the gold file for the article above, article123456.txt:

  123456     34     40
  123456    299    368
  123456    400    416
  123456    607    653

gold label SI file: article123456.task1-SI.labels

Task 2 - Technique Classification

The format of a tab-separated line of the gold label and the submission files for task TC is:

 id   technique    begin_offset     end_offset

where id is the identifier of the article, technique is one out of the 18 techniques, begin_offset is the character where the covered span begins (included) and end_offset is the character where the covered span ends (not included). Therefore, a span ranges from begin_offset to end_offset-1. The first character of an article has index 0. The number of lines in the file corresponds to the number of techniques spotted (for this task overlapping techniques are not merged). This is the gold file for the article above, article123456.txt:

123456    Name_Calling,Labeling      34     40
 123456    Black-and-White_Fallacy    299    368
 123456    Loaded_Language            400    416
 123456    Exaggeration,Minimization  607    653
 123456    Loaded_Language            635    653

gold label TC file: article123456.task2-TC.labels

Evaluation

Upon registration, participants will have access to their team page, where they can also download scripts for scoring both tasks. Here is a brief description of the evaluation measures the scorers compute.

Task SI

SI task consists in the identification of the propagandistic fragments. The evaluation function gives credit to partial matching between two spans. In a nutshell, the partial credit is proportional to the intersection of the two spans, and it is normalized by the length of the two spans. To know more check our detailed description.

Task TC

TC is a multi-class classification task. Notice that the distribution of the gold labels is rather imbalanced. Therefore the official evaluation measure for the task is the micro-averaged F₁ measure. In addition, we will report Precision and Recall.

~~September 5th~~	Registration opens
September 5th	Release of the training and development sets.
~~February 18th~~	Registration closes
~~February 19th~~	Release of the test set for task SI
~~March 2nd~~	Task SI test submissions site closes
~~March 3rd~~	Release of the test set for task TC
~~March 11th~~	Task TC test submissions site closes
~~May 15th~~	Paper Submission Deadline
~~June 24th~~	Notification to authors
~~July 24th~~	Camera ready papers due
December 12-13	SemEval 2020 workshop@COLING


Giovanni Da San Martino Qatar Computing Research Institute, HBKU	Alberto Barrón-Cedeño Università di Bologna	Preslav Nakov Qatar Computing Research Institute, HBKU

Henning Wachsmuth Paderborn University	Rostislav Petrov A Data Pro

SemEval 2020 Task 11
"Detection of Propaganda Techniques in News Articles"

The SemEval 2020 task 11 is over, but we have opened a permanent leaderboard for those who want to get the data, keep making submissions on the test set and check the state of the art on the tasks.

Task Description

Propagandistic news articles use specific techniques to convey their message, such as whataboutism, red Herring, and name calling, among many others. Help us developing automatic tools to detect such techniques!

Background

Technical Description

Data Description

Input Articles

Gold Labels and Submission Format

Task 1 - Span Identification

Task 2 - Technique Classification

Evaluation

Task SI

Task TC

How to Participate

Rules

Dates

Tentative Schedule

Contact

Organisation:

SemEval 2020 Task 11 "Detection of Propaganda Techniques in News Articles"

The SemEval 2020 task 11 is over, but we have opened a permanent leaderboard for those who want to get the data, keep making submissions on the test set and check the state of the art on the tasks.

Task Description

Propagandistic news articles use specific techniques to convey their message, such as whataboutism, red Herring, and name calling, among many others. Help us developing automatic tools to detect such techniques!

Background

Technical Description

Data Description

Input Articles

Gold Labels and Submission Format

Task 1 - Span Identification

Task 2 - Technique Classification

Evaluation

Task SI

Task TC

How to Participate

Rules

Dates

Tentative Schedule

Contact

Organisation:

SemEval 2020 Task 11
"Detection of Propaganda Techniques in News Articles"