Skip to content

e-ditiones/Annotator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Annotator

This script process segmentation, lemmatization, normalization and NER of XML-TEI encoded files.

Getting starded

TO DO

Normalization and NER are still a work in progress.

To install Annotator, using command lines, you have to :

  • clone or download this repository
git clone https://github.com/e-ditiones/Annotator.git
cd Annotator

How to use it

  1. The XML-files to be processed need to be in the in_XML folder.

  2. Run the script

bash process.sh
  1. Results are in the out folder :
    • XML : contains XML annotated files ;
    • TSV : contains the annotation in TSV format.

How it works

The lemmazition

For lemmatisation, we use Pie-extended and the "freem" model.

Credits

This repository is developed by Alexandre Bartz with the help of Simon Gabay, as part of the project e-ditiones.

Licences

Licence Creative Commons
Our work is licenced under a Creative Commons Attribution 4.0 International Licence.

Pie-extended is under the Mozilla Public License 2.0.

Cite this repository À CHANGER

Alexandre Bartz, Simon Gabay. 2020. Lemmatization and normalization of French modern manuscripts and printed documents. Retrieved from https://github.com/e-ditiones/Annotator.

About

Automatic annotation of XML encoded files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published