This repository contains the Python code to reproduce the experiments presented in our paper:
An Incremental Clustering Baseline for Event Detection on Twitter.
We encourage you to create a virtual environment to install Python 3.8.2. Below are two examples, one with conda, another with pyenv-virtualenv.
git clone https://github.com/medialab/workshop-event-detection.git
cd workshop-event-detection
conda create -n workshop python=3.8.2
source activate workshop
pip install -U pip setuptools
pip install -r requirements.txt
git clone https://github.com/medialab/workshop-event-detection.git
cd workshop-event-detection
pyenv virtualenv 3.8.2 workshop
pyenv activate workshop
pip install -U pip setuptools
pip install -r requirements.txt
We test our method on 2 datasets, Event2012 [McMinn et al., 2013] and Event2018 [Mazoyer et al., 2020]. Follow the instructions by [Cao et al., 2024] here to download the data. Place the entire ./raw_data folder under the root folder.
python preprocess.py
- Run event detection on Event2018 dataset with
Sentence-CamemBERT Large (GPU required):
python run_detection.py --model sbert --sub-model "dangvantuan/sentence-camembert-large" --lang fr --dataset event2018.tsv
- Run event detection on Event2012 dataset with all-mpnet-base-v2 (GPU required):
python run_detection.py --model sbert --sub-model "sentence-transformers/all-mpnet-base-v2" --lang en --dataset event2012.tsv
python generate_table.py