Skip to content

medialab/twitter-incremental-clustering

Repository files navigation

Workshop 2024 on Event Detection

This repository contains the Python code to reproduce the experiments presented in our paper:

An Incremental Clustering Baseline for Event Detection on Twitter.

Table of contents

Installation

We encourage you to create a virtual environment to install Python 3.8.2. Below are two examples, one with conda, another with pyenv-virtualenv.

With conda

git clone https://github.com/medialab/workshop-event-detection.git
cd workshop-event-detection
conda create -n workshop python=3.8.2
source activate workshop
pip install -U pip setuptools
pip install -r requirements.txt

With pyenv-virtualenv

git clone https://github.com/medialab/workshop-event-detection.git
cd workshop-event-detection
pyenv virtualenv 3.8.2 workshop
pyenv activate workshop
pip install -U pip setuptools
pip install -r requirements.txt

Download data

We test our method on 2 datasets, Event2012 [McMinn et al., 2013] and Event2018 [Mazoyer et al., 2020]. Follow the instructions by [Cao et al., 2024] here to download the data. Place the entire ./raw_data folder under the root folder.

Preprocess data

python preprocess.py

Run event detection

  1. Run event detection on Event2018 dataset with Sentence-CamemBERT Large (GPU required):
    python run_detection.py --model sbert --sub-model "dangvantuan/sentence-camembert-large" --lang fr --dataset event2018.tsv
  2. Run event detection on Event2012 dataset with all-mpnet-base-v2 (GPU required):
     python run_detection.py --model sbert --sub-model "sentence-transformers/all-mpnet-base-v2" --lang en --dataset event2012.tsv

Generate latex table

python generate_table.py

About

Embeddings for event detection and tracking in social media data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages