From 10eea6a2bf4aa1690844a28a8f1e40bc6a898b02 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Thibault=20Cl=C3=A9rice?= Date: Mon, 22 Jun 2020 17:35:15 +0200 Subject: [PATCH] 0.0.1 - First Release --- README.md | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 3441566..a315a53 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,40 @@ -# Greek Lemmatization and Morpho-Syntactic Data +# Ancient Greek Lemmatization and Morpho-Syntactic Data ## Referentials Lemma are from the *Henry George Liddell, Robert Scott, A Greek-English Lexicon* +## Scores + +### POS + +| | accuracy | precision | recall | support | +|------------------|----------|-----------|--------|---------| +| all | 0.9515 | 0.7776 | 0.7375 | 71235 | +| known-tokens | 0.9555 | 0.7663 | 0.73 | 66241 | +| unknown-tokens | 0.8989 | 0.5279 | 0.5096 | 4994 | +| ambiguous-tokens | 0.9244 | 0.7384 | 0.7169 | 35332 | + +### Lemma + +| | accuracy | precision | recall | support | +|------------------|----------|-----------|--------|---------| +| all | 0.9592 | 0.7938 | 0.7904 | 71235 | +| known-tokens | 0.9704 | 0.8963 | 0.8994 | 66241 | +| unknown-tokens | 0.8106 | 0.6406 | 0.6321 | 4994 | +| ambiguous-tokens | 0.9272 | 0.5782 | 0.6005 | 22879 | + +### Lemma without diacritics + +| | accuracy | precision | recall | support | +|------------------|----------|-----------|--------|---------| +| all | 0.9613 | 0.8275 | 0.824 | 71235 | +| known-tokens | 0.9714 | 0.9185 | 0.9199 | 66241 | +| unknown-tokens | 0.827 | 0.6787 | 0.672 | 4994 | +| ambiguous-tokens | 0.9301 | 0.6609 | 0.6707 | 21002 | +| unknown-targets | 0.9497 | 0.8259 | 0.8223 | 51804 | + + ## Script 1. Run `build.py` to get the "simple" training data @@ -46,7 +77,9 @@ Mozilla Public Licence 91 chars found + | Char | Count | +| ---- | ----- | | | 7743 | | " | 4219 | | % | 4 | @@ -137,4 +170,5 @@ Mozilla Public Licence | ’ | 5404 | | “ | 4 | | † | 74 | -| ⏑ | 4 | \ No newline at end of file +| ⏑ | 4 | +