Skip to content

Data and materials for the paper "Experts and Authorities receive disproportionate attention on Twitter during the COVID-19 crisis"

License

Notifications You must be signed in to change notification settings

digitalepidemiologylab/experts-covid19-twitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Experts and Authorities receive disproportionate attention on Twitter during the COVID-19 crisis

DOI

This repository contains annotation data, and ML models related to the work "Experts and Authorities receive disproportionate attention on Twitter during the COVID-19 crisis".

If you intend to use any of these materials, please make sure to cite the work accordingly:

Gligorić, Kristina et al. “Experts and authorities receive disproportionate attention on Twitter during the COVID-19 crisis.” (2020).

@misc{gligori2020experts,
    title={Experts and authorities receive disproportionate attention on Twitter during the COVID-19 crisis},
    author={Kristina Gligorić and Manoel Horta Ribeiro and Martin Müller and Olesia Altunina and Maxime Peyrard and Marcel Salathé and Giovanni Colavizza and Robert West},
    year={2020},
    eprint={2008.08364},
    archivePrefix={arXiv},
    primaryClass={cs.SI}
}

Annotation data

User descriptions have been annotated by type and category. Find the data here.

The CSV file has the following columns:

Column Description
user.id Twitter user ID
category Consensus category (collapsed)
type Consensus type
tweeting_lang Language user is tweeting in usually
bio_lang Language bio (user description) is written in
type_1 Type annotation by annotator 1
type_2 Type annotation by annotator 2
type_3 Type annotation by annotator 3
type_4 Type annotation by annotator 4 (if available)
category_1 Categories (uncollapsed) by annotator 1
category_2 Categories (uncollapsed) by annotator 2
category_3 Categories (uncollapsed) by annotator 3
category_4 Categories (uncollapsed) by annotator 4 (if available)

The annotations contain the following labels:

  • Category (collapsed): Labels: "art", "business", "healthcare", "media", "ngo", "other", "political_supporter", "politics", "adult_content", "public_services", "religion", "science", "sports"
  • Type: Labels: "individual", "institution", "unclear"

Additionally, the following category labels have a more fine-grained (uncollapsed) annotation:

  • "media": "Media: News", "Media: Scientific News and Communication", "Media: Other Media"
  • "science": "Science: Engineering and Technology", "Science: Life Sciences", "Science: Social Sciences", "Science: Other Sciences"

Please refer to our paper for a more in-detail explanation of the individual annotations, and how the annotation was performed.

Text classification models

Based on the annotation data provided above, several models have been trained on the category (collapsed) and type objectives.

The models made available are from two different model types (BERT-like and Fasttext). For more info regarding training procedure, please check the SI of the paper.

BERT

The easiest way to use the BERT models is by using the PyTorch models together with the huggingface/transformers library. To install run

pip install transformers

Then from the table below download a suitable PyTorch checkpoint, extract it using tar -xzf <tar_file> and run:

from transformers import pipeline

path_to_model = './category_bert_multilang_pt/'

# We are using the sentiment-analysis type (even though our model is not a sentiment analysis model)
pipe = pipeline('sentiment-analysis', model=path_to_model, tokenizer=path_to_model)

# Feed an example input
pipe('artiste et paintre')
# output:
# [{'label': 'art', 'score': 0.9069588780403137}]

For the TF checkpoints you should use them with tensorflow/models.

FastText

To use the FastText models run

pip install fasttext

Download & extract one of the FastText models and run

import fasttext

model = fasttext.load_model('./category_fasttext/model.bin')
print(model.predict('virologist'))
# (('__label__science',), array([0.98916745]))

Download models

Description Language Dataset Identifier Size Download
BERT multilingual category (BERT-multlingual cased) multilang category bert-multilang-pt 630MB PyTorch | TF
BERT multilingual type (BERT-multlingual cased) multilang type bert-multilang-pt 630MB PyTorch | TF
BERT English category (BERT-large uncased) en category bert-english-pt 1.2GB PyTorch | TF
BERT English type (BERT-large uncased) en type bert-english-pt 1.2GB PyTorch | TF
FastText english category en category fasttext-english 200MB Download
FastText english type en type fasttext-english 426MB Download

About

Data and materials for the paper "Experts and Authorities receive disproportionate attention on Twitter during the COVID-19 crisis"

Resources

License

Stars

Watchers

Forks