The task is over. If you want to get the data, consider registering for the SemEval 2020 - Task 11 shared task .
Propagandistic news articles use specific techniques to convey their message, such as whataboutism, red Herring, and name calling, among many others. Help us developing automatic tools to detect such techniques!We refer to propaganda whenever information is purposefully shaped to foster a predetermined agenda. Propaganda uses psychological and rhetorical techniques to reach its purpose. Such techniques include the use of logical fallacies and appealing to the emotions of the audience. Logical fallacies are usually hard to spot since the argumentation, at first sight, might seem correct and objective. However, a careful analysis shows that the conclusion cannot be drawn from the premise without the misuse of logical rules. Another set of techniques makes use of emotional language to induce the audience to agree with the speaker only on the basis of the emotional bond that is being created, provoking the suspension of any rational analysis of the argumentation. All of these techniques are intended to go unnoticed to achieve maximum effect.
The goal of the shared task is to produce models capable of spotting sentences and text fragments in which propaganda techniques are used in a news article.
You will be provided with a corpus of about 500 news articles in which specific propagandistic fragments have been manually spotted and labeled. Two tasks are defined on the corpus:
The competition is divided in 3 phases:
The input for both tasks will be news articles in plain text format. In the first phase, participants will be provided with two folders, train-articles and dev-articles (in the second phase we will release a third folder for the test set). Each article appears in one .txt file. The title is on the first row, followed by an empty row. The content of the article starts from the third row, one sentence per line. Each article has been retrieved with the newspaper3k library and sentence splitting has been performed automatically with NLTK sentence splitter.
Here is an example article (we assume the article id is 123456):
1 | 0Manchin says Democrats acted like 34babies40 at the SOTU (video) Personal Liberty Poll Exercise your right to vote. |
2 | |
3 | Democrat West Virginia Sen. Joe Manchin says his colleagues’ refusal to stand or applaud during President Donald Trump’s State of the Union speech was disrespectful and a signal that 299the party is more concerned with obstruction than it is with progress368. |
4 | In a glaring sign of just how 400stupid and petty416 things have become in Washington these days, Manchin was invited on Fox News Tuesday morning to discuss how he was one of the only Democrats in the chamber for the State of the Union speech 607not looking as though Trump 635killed his grandma653. |
5 | When others in his party declined to applaud even for the most uncontroversial of the president’s remarks, Manchin did. |
6 | He even stood for the president when Trump entered the room, a customary show of respect for the office in which his colleagues declined to participate. |
Notice that line numbers and superscripts are not present in the original article file, we have added them here in order to be able to reference sentences and text spans. The text is noisy, which makes the task trickier: for example in row 1 "Personal Liberty Poll Exercise your right to vote." is clearly not part of the title.
There are several propaganda techniques that were used in the article above:
The format of a tab-separated line of the gold label and the submission files for task FLC is:
id technique begin_offset end_offset
where id is the identifier of the article, technique is one out of the 18 techniques, begin_offset is the character where the covered span begins (included) and end_offset is the character where the covered span ends (not included). Therefore, a span ranges from begin_offset to end_offset-1. The first character of an article has index 0. The number of lines in the file corresponds to the number of techniques spotted. This is the gold file for the article above, article123456.txt:
123456 Name_Calling,Labeling 34 40 123456 Black-and-White_Fallacy 299 368 123456 Loaded_Language 400 416 123456 Exaggeration,Minimization 607 653 123456 Loaded_Language 635 653
The format of a tab-separated line of the gold label and the submission files for task SLC is:
article_id sentence_id label
where article_id and sentence_id are the identifiers of the article and the sentence (the first sentence has id 1) and label={propaganda/non-propaganda}. Gold and submission files must have the same number of rows as the number of sentences, i.e. of lines, in the article. In order to help participants preparing a submission, we provide template prediction files, which have the same format of the gold files where label is replaced with ?. For example, the gold label and template files of task SLC for the article above would look as follows:
123456 1 propaganda 123456 2 non-propaganda 123456 3 propaganda 123456 4 propaganda 123456 5 non-propaganda 123456 6 non-propaganda
123456 1 ? 123456 2 ? 123456 3 ? 123456 4 ? 123456 5 ? 123456 6 ?
Upon registration, participants will have access to their team page, where they can also download scripts for scoring both tasks. Here is a brief description of the evaluation measures the scorers compute.
FLC is a composition of two tasks: the identification of the propagandistic text fragments and the identification of the technique used in the fragment (18-class classification task). While F1 measure is appropriate for a multi-class classification task, we modify it to account for partial matching between the spans. In addition, an F1 value is computed for each propaganda technique.
SLC is a binary classification task in which the data is imbalanced. Therefore the official evaluation measure for the task is the standard F1 measure. In addition, we will report Precision and Recall.
Registration opens | |
Release of the training and development sets. | |
Leaderboard opens | |
Release of the test set | |
Registration closes | |
Test set submission site closes | |
Release of the results on the Test set | |
Participants paper submission deadline | |
Reviews submission deadline | |
Paper Acceptance Notification | |
Final Paper Submission Deadline | |
NLP4IF Workshop at EMNLP |
Some of the team participating in the shared task have submitted a paper describing their approach to the NLP4IF workshop. Proceedigns are available on the ACL Anthology Website. Direct links to the (updated versions) of the shared task papers are listed below:
We have created a google group for the task. Join it to ask any question and to interact with other participants.
Follow us on twitter to get the latest updates on the competition!
If you need to contact the organisers only, send us an email.
![]() |
![]() |
![]() |
Giovanni Da San Martino Qatar Computing Research Institute, HBKU |
Alberto Barrón-Cedeño Università di Bologna, Italy |
Preslav Nakov Qatar Computing Research Institute, HBKU |
Data annotation has been provided
We thank for their help in advertising the task
The Shared Task is part of the 2019 Workshop on NLP4IF: censorship, disinformation, and propaganda , co-located with the EMNLP-IJCNLP conference, November 3-7 2019, Hong Kong.
This initiative is part of the Propaganda Analysis Project