From 649600e108e0d11d07908fc906c10f0d915e12e0 Mon Sep 17 00:00:00 2001 From: Frank Sachsenheim Date: Sat, 11 May 2024 13:47:34 +0200 Subject: [PATCH] integrations-tests: Amends README with instructive information --- integration-tests/README.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/integration-tests/README.md b/integration-tests/README.md index 76ed101..8ff5ab7 100644 --- a/integration-tests/README.md +++ b/integration-tests/README.md @@ -1,20 +1,24 @@ # Integration tests against corpora This folder serves as playground for tests of basic functionality against many -XML documents, mostly TEI-encodings. They are supposed to be executed with +XML documents, mostly TEI encodings. They are supposed to be executed with major code change proposals and before releases. ## Test corpus Place document collections into the `corpora` folder. The `fetch-corpora.py` -script helps to get going with the minimal requirement (~3GB) of tests. +script helps to get going with the minimal requirement (~6GB) of data. +Set any non-empty string as environment variable `SKIP_EXISTING` to skip +downloading a corpus whose target folder already exists. Due to the `lb` tag [issue](https://github.com/deutschestextarchiv/dtabf/issues/33) with the DTABf the DTA corpus isn't considered. It could be an experiment to use *delb* for transformations with regards to the conclusions of that issue. The `normalize-corpora.py` script addresses issues that were found in the text -encodings and must be run before the tests. +encodings and must be run after fetching test data. +One of the corpus folder names can be passed as argument to the script in order +to process only this one's contents. ## Tests @@ -25,3 +29,8 @@ reserves a `report.txt` for messages redirected from *stdout*. When problems occur, carefully investigate that it's not due to the source, and if not extract simple enough cases for the unit tests. + +## TODO + +After adding the third kind of test, wrap all scripts here into a +[textual](https://textual.textualize.io) app.