diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index fcf70d05f9..7e2a849de3 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -40,6 +40,7 @@ If you are wondering how to do something with docTR, or a more general question, Install all additional dependencies with the following command: ```shell +python -m pip install --upgrade pip pip install -e .[dev] pre-commit install ``` @@ -75,12 +76,15 @@ make style ### Modifying the documentation -In order to check locally your modifications to the documentation: +The current documentation is built using `sphinx` thanks to our CI. +You can build the documentation locally: ```shell make docs-single-version ``` +Please note that files that have not been modified will not be rebuilt. If you want to force a complete rebuild, you can delete the `_build` directory. Additionally, you may need to clear your web browser's cache to see the modifications. + You can now open your local version of the documentation located at `docs/_build/index.html` in your browser ## Let's connect diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000000..972bda511d --- /dev/null +++ b/docs/README.md @@ -0,0 +1,13 @@ +# Contribute to Documentation + +Please have a look at our [contribution guide](../CONTRIBUTING.md) to see how to install +the development environment and how to generate the documentation. + +To install only the `docs` environment, you can do: + +```bash +# Make sure you are at the root of the repository before executing these commands +python -m pip install --upgrade pip +pip install -e .[tf] # or .[torch] +pip install -e .[docs] +``` diff --git a/docs/source/modules/models.rst b/docs/source/modules/models.rst index 40c2f1e6e9..95cf02e830 100644 --- a/docs/source/modules/models.rst +++ b/docs/source/modules/models.rst @@ -73,7 +73,7 @@ doctr.models.recognition .. autofunction:: doctr.models.recognition.vitstr_base -.. autofunction:: doctr.models.recogntion.parseq +.. autofunction:: doctr.models.recognition.parseq .. autofunction:: doctr.models.recognition.recognition_predictor diff --git a/docs/source/using_doctr/running_on_aws.rst b/docs/source/using_doctr/running_on_aws.rst index a824f354e9..8a5e1a4cc4 100644 --- a/docs/source/using_doctr/running_on_aws.rst +++ b/docs/source/using_doctr/running_on_aws.rst @@ -1,7 +1,10 @@ AWS Lambda -======================== +========== -AWS Lambda's (read more about Lambda https://aws.amazon.com/lambda/) security policy does not allow you to write anywhere outside `/tmp` directory. -There are two things you need to do to make `doctr` work on lambda: -1. Disable usage of `multiprocessing` package by setting `DOCTR_MULTIPROCESSING_DISABLE` enivronment variable to `TRUE`. You need to do this, because this package uses `/dev/shm` directory for shared memory. -2. Change directory `doctr` uses for caching models. By default it's `~/.cache/doctr` which is outside of `/tmp` on AWS Lambda'. You can do this by setting `DOCTR_CACHE_DIR` enivronment variable. +The security policy of `AWS Lambda `_ restricts writing outside the ``/tmp`` directory. + +To make docTR work on Lambda, you need to perform the following two steps: + +1. Disable the usage of the ``multiprocessing`` package by setting the ``DOCTR_MULTIPROCESSING_DISABLE`` environment variable to ``TRUE``. This step is necessary because the package uses the ``/dev/shm`` directory for shared memory. + +2. Change the caching directory used by docTR for models. By default, it is set to ``~/.cache/doctr``, which is outside the ``/tmp`` directory on AWS Lambda. You can modify this by setting the ``DOCTR_CACHE_DIR`` environment variable. diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index 5c2d62fceb..1a46c2bb79 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -50,7 +50,7 @@ Explanations about the metrics being used are available in :ref:`metrics`. *Disclaimer: both FUNSD subsets combined have 199 pages which might not be representative enough of the model capabilities* -FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large ` AWS instance (CPU Xeon Platinum 8275L). +FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large `_ AWS instance (CPU Xeon Platinum 8275L). Detection predictors @@ -151,7 +151,7 @@ While most of our recognition models were trained on our french vocab (cf. :ref: *Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities* -FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large ` AWS instance (CPU Xeon Platinum 8275L). +FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large `_ AWS instance (CPU Xeon Platinum 8275L). Recognition predictors @@ -206,7 +206,7 @@ Explanations about the metrics being used are available in :ref:`metrics`. *Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities* -FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed frames per second over 1000 samples. Those results were obtained on a `c5.x12large ` AWS instance (CPU Xeon Platinum 8275L). +FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed frames per second over 1000 samples. Those results were obtained on a `c5.x12large `_ AWS instance (CPU Xeon Platinum 8275L). Since you may be looking for specific use cases, we also performed this benchmark on private datasets with various document types below. Unfortunately, we are not able to share those at the moment since they contain sensitive information. @@ -330,14 +330,18 @@ For reference, here is the JSON export for the same `Document` as above:: ] } -To export the outpout as XML (hocr-format) you can use the `export_as_xml` method:: +To export the outpout as XML (hocr-format) you can use the `export_as_xml` method: + +.. code-block:: python xml_output = result.export_as_xml() for output in xml_output: xml_bytes_string = output[0] xml_element = output[1] -For reference, here is a sample XML byte string output:: +For reference, here is a sample XML byte string output: + +.. code-block:: xml @@ -360,3 +364,4 @@ For reference, here is a sample XML byte string output:: +