Skip to content

An Interpretable Deep Learning Approach for Morphological Script Type Analysis (IWCP 2024)

Notifications You must be signed in to change notification settings

malamatenia/learnable-handwriter

Repository files navigation

An Interpretable Deep Learning Approach for Morphological Script Type Analysis (IWCP 2024)

https://learnable-handwriter.github.io/

Malamatenia Vlachou Efstathiou, Ioannis Siglidis, Dominique Stutzmann and Mathieu Aubry

LTW_graph.png

  • For minimal inference on pre-trained and finetuned models without having to install, we provide a standalone ColabOpen In Colab notebook, available also as inference.ipynb.

  • A figures.ipynb notebook is provided to reproduce the paper results and graphs. You'll need to download & extract datasets.zip and runs.zip in the base folder first.

Getting Started

Install
 conda create --name ltw pytorch==2.1.1 torchvision==0.15.0 cudatoolkit=11.3 -c pytorch -c conda-forge
 conda activate ltw
 python -m pip install -r requirements.txt

Run it from scratch on our dataset

Train

In this case you'll need to download & extract only the datasets.zip.

Train our reference model with:

 python scripts/train.py iwcp_south_north.yaml 
Finetune

1. Our Northern and Southern Textualis models with:

python scripts/finetune_scripts.py -i runs/iwcp_south_north/train/ -o runs/iwcp_south_north/finetune/ --mode g_theta --max_steps 2500 --invert_sprites --script Northern_Textualis Southern_Textualis -a datasets/iwcp_south_north/annotation.json -d datasets/iwcp_south_north/ --split train

2. Our document models with:

python scripts/finetune_docs.py -i runs/iwcp_south_north/train/ -o runs/iwcp_south_north/finetune/ --mode g_theta --max_steps 2500 --invert_sprites -a datasets/iwcp_south_north/annotation.json -d datasets/iwcp_south_north/ --split all

Run it on your data

Create your config files:

1. Create a config file for the dataset:

configs/dataset/<DATASET_ID>.yaml
...

DATASET-TAG:                 
  path: <DATASET-NAME>/      
  sep: ''                    # How the character separator is denoted in the annotation. 
  space: ' '                 # How the space is denoted in the annotation.

2. then a second one setting the hyperparameters:

configs/<DATASET_ID>.yaml
...

For its structure, see the config file provided for our experiment.

Create your dataset folder:

3. Create the dataset folder:

datasets/<DATASET-NAME>
├── annotation.json
└── images
  ├── <image_id>.png 
  └── ...

The annotation.json file should be a dictionary with entries of the form:

    "<image_id>": {
        "split": "train",                            # {"train", "val", "test"} - "val" is ignored in the unsupervised case.
        "label": "A beautiful calico cat."           # The text that corresponds to this line.
        "script": "Times_New_Roman"                  # (optional) Corresponds to the script type of the image
    },

You can completely ignore the annotation.json file in the case of unsupervised training without evaluation.

Train and finetune

4. Train with

   python scripts/train.py <CONFIG_NAME>.yaml

5. Finetune

  • On a group of documents defined by their "script" type with:
python scripts/finetune_scripts.py -i runs/<MODEL_PATH> -o <OUTPUT_PATH> --mode g_theta --max_steps <int> --invert_sprites --script '<SCRIPT_NAME>' -a <DATASET_PATH>/annotation.json -d <DATASET_PATH> --split <train or all>
  • On individual documents with:
python scripts/finetune_docs.py -i runs/<MODEL_PATH> -o <OUTPUT_PATH> --mode g_theta --max_steps <int> --invert_sprites -a <DATASET_PATH>/annotation.json -d <DATASET_PATH> --split <train or all>

[!NOTE] To ensure a consistent set of characters regardless of the annotation source for our analysis, we implement internally choco-mufin, using a disambiguation-table.csv to normalize or exclude characters from the annotations. The current configuration suppresses allographs and edition signs (e.g., modern punctuation) for a graphetic result.

Cite us

@misc{vlachou2024interpretable,
    title = {An Interpretable Deep Learning Approach for Morphological Script Type Analysis},
    author = {Vlachou-Efstathiou, Malamatenia and Siglidis, Ioannis and Stutzann, Dominique and Aubry, Mathieu},
    publisher = {Document Analysis and Recognition--ICDAR 2021 Workshops: Athens, Greece, August 30--September 4, 2023, Proceedings},
    year = {2024},
    organization={Springer}, 
    url={https://arxiv.org/abs/2408.11150}}

Check out also: Siglidis, I., Gonthier, N., Gaubil, J., Monnier, T., & Aubry, M. (2023). The Learnable Typewriter: A Generative Approach to Text Analysis.

Acknowledgements

This study was supported by the CNRS through MITI and the 80|Prime program (CrEMe Caractérisation des écritures médiévales) , and by the European Research Council (ERC project DISCOVER, number 101076028). We thank Ségolène Albouy, Raphaël Baena, Sonat Baltacı, Syrine Kalleli, and Elliot Vincent for valuable feedback on the paper.

About

An Interpretable Deep Learning Approach for Morphological Script Type Analysis (IWCP 2024)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages