Skip to content

COVIDLies v1.0

Latest
Compare
Choose a tag to compare
@tamannahossainkay tamannahossainkay released this 01 Mar 08:32
· 2 commits to master since this release
125e68a

To facilitate research in automatic COVID19 misinformation detection, we introduce the COVID-Lies dataset for misconception detection on Twitter. We have collected a dataset of 62 common misconceptions about the disease along with related tweets, identified and annotated by researchers from the UCI School of Medicine. Given a tweet, our data identifies whether any of the known misconceptions are expressed by the tweet, and if so, whether the tweet propagates the misconception (agree/pos), is informative by contradicting it (disagree/neg), or is neither misinformative nor informative (no stance/na).

COVIDLies v1.0 consists of 6591 misconception-tweet pairs with expert annotated stance labels. This is an evolving dataset as annotation is ongoing.

Note, that the following changes to the misconceptions have been made in this release with re-annotations performed where needed.

  • Removal:
    • Political: Misconceptions pertaining to the actions of particular political parties, governments, religious groups, or ethnicities, were removed. Eg. 'Trump is fulfilling his promise to hit Iranian cultural sites, if Iranians took revenge for the US airstrike that killed of Quds Force Commander Qasem Soleimani.'
    • Multi-modal: Misconceptions about non-textual modalities, such as, images and videos were removed. Eg. 'Coronavirus is a state-supported "a bioweapon that went rogue" and also fake videos alleging that Chinese authorities are killing citizens to prevent its spread.'
    • Duplicates: De-duplication of misconceptions was performed. Eg. 'Holy communion cannot be the cause of the spread of coronavirus' was removed while 'Coronavirus cannot be spread by practicing holy communion.' was kept.
  • Compound to atomic: Compound misconceptions were split into atomic misconceptions. Eg. Avocado and mint tea, hot whiskey and honey, essential oils, vitamins c and d, fennel tea and cocaine cure coronavirus. --> 'Avocado and mint tea cures coronavirus.', 'Essential oils cure coronavirus.', 'Vitamin C cures coronavirus.','Vitamin D cures coronavirus.', 'Fennel tea cures coronavirus.', and 'Cocaine cures coronavirus.'
  • Corrections: Eg. 'There were more than 50000 cremations in Wuhan for 4th Quarter, 2020.' --> 'There were more than 50000 cremations in Wuhan for 4th Quarter, 2019.'
  • Edits: Eg.'Chloroquine was used to cure over 12,000 covid-19 patients.' --> 'Chloroquine can cure coronavirus.'

To comply with Twitter’s Terms of Service , we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.