Code accompanying the paper "ConstraintMatch for Semi-constrained Clustering", published at the 2023 International Joint Conference on Neural Networks (IJCNN).
The codebase is licensed via the MIT license
- create a
python=3.7
miniconda environment and activate that environment - install the required packages via
pip install -r requirements.txt
- install the local package
sscc
viapip install -e .
from the main folder - run the
get_pretrained_scan_weights.sh
script to retrieve all pretrained model weights from an anonymous GDrive - test your install via
python run.py --config configs/intro.yaml
from theexperiments
level to- download the cifar10 dataset in a new
experiments/data/
folder and sample constraints accordingly over the folds - train ConstraintMatch on in this setting with pre-training (loading previously downloaded model weights) for 100 training steps
- store the results via
mlflow
logging
- download the cifar10 dataset in a new
We work with config files in this codebase (stored in experiments/configs/
). These allow you to run different models in different settings on different datasets. See some examples below (all model runs executed from the sscc/experiments
level):
ConstraintMatch:
python run.py --config configs/constraintmatch_cifar10.yaml
for ConstraintMatch on Cifar10 with 10k constraintspython run.py --config configs/constraintmatch_stl10.yaml
for ConstraintMatch on STL-10 with 10k constraintspython run.py --config configs/constraintmatch_overcluster_cifar20.yaml
for ConstraintMatch on Cifar100-20 with 10k constraints in the overclustering setting withnout=100
instead of the truenumclasses=20
Constrained Baseline:
python run.py --config configs/constrained_cifar10.yaml
for the constrained baseline on Cifar10 with 10k constraints
Pseudo-Labeling Competitor:
python run.py --config configs/pseudolabel_cifar20.yaml
for the pseudo-labeling competitor only using pseudo-labels on Cifar20 with 10k constraints