Skip to content

Commit

Permalink
docs: ✏️ fix typos in documentation (#1246)
Browse files Browse the repository at this point in the history
  • Loading branch information
odulcy authored Jul 10, 2023
1 parent 4e1985f commit 609cf4a
Show file tree
Hide file tree
Showing 5 changed files with 37 additions and 12 deletions.
6 changes: 5 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ If you are wondering how to do something with docTR, or a more general question,
Install all additional dependencies with the following command:

```shell
python -m pip install --upgrade pip
pip install -e .[dev]
pre-commit install
```
Expand Down Expand Up @@ -75,12 +76,15 @@ make style

### Modifying the documentation

In order to check locally your modifications to the documentation:
The current documentation is built using `sphinx` thanks to our CI.
You can build the documentation locally:

```shell
make docs-single-version
```

Please note that files that have not been modified will not be rebuilt. If you want to force a complete rebuild, you can delete the `_build` directory. Additionally, you may need to clear your web browser's cache to see the modifications.

You can now open your local version of the documentation located at `docs/_build/index.html` in your browser

## Let's connect
Expand Down
13 changes: 13 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Contribute to Documentation

Please have a look at our [contribution guide](../CONTRIBUTING.md) to see how to install
the development environment and how to generate the documentation.

To install only the `docs` environment, you can do:

```bash
# Make sure you are at the root of the repository before executing these commands
python -m pip install --upgrade pip
pip install -e .[tf] # or .[torch]
pip install -e .[docs]
```
2 changes: 1 addition & 1 deletion docs/source/modules/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ doctr.models.recognition

.. autofunction:: doctr.models.recognition.vitstr_base

.. autofunction:: doctr.models.recogntion.parseq
.. autofunction:: doctr.models.recognition.parseq

.. autofunction:: doctr.models.recognition.recognition_predictor

Expand Down
13 changes: 8 additions & 5 deletions docs/source/using_doctr/running_on_aws.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
AWS Lambda
========================
==========

AWS Lambda's (read more about Lambda https://aws.amazon.com/lambda/) security policy does not allow you to write anywhere outside `/tmp` directory.
There are two things you need to do to make `doctr` work on lambda:
1. Disable usage of `multiprocessing` package by setting `DOCTR_MULTIPROCESSING_DISABLE` enivronment variable to `TRUE`. You need to do this, because this package uses `/dev/shm` directory for shared memory.
2. Change directory `doctr` uses for caching models. By default it's `~/.cache/doctr` which is outside of `/tmp` on AWS Lambda'. You can do this by setting `DOCTR_CACHE_DIR` enivronment variable.
The security policy of `AWS Lambda <https://aws.amazon.com/lambda/>`_ restricts writing outside the ``/tmp`` directory.

To make docTR work on Lambda, you need to perform the following two steps:

1. Disable the usage of the ``multiprocessing`` package by setting the ``DOCTR_MULTIPROCESSING_DISABLE`` environment variable to ``TRUE``. This step is necessary because the package uses the ``/dev/shm`` directory for shared memory.

2. Change the caching directory used by docTR for models. By default, it is set to ``~/.cache/doctr``, which is outside the ``/tmp`` directory on AWS Lambda. You can modify this by setting the ``DOCTR_CACHE_DIR`` environment variable.
15 changes: 10 additions & 5 deletions docs/source/using_doctr/using_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Explanations about the metrics being used are available in :ref:`metrics`.

*Disclaimer: both FUNSD subsets combined have 199 pages which might not be representative enough of the model capabilities*

FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>` AWS instance (CPU Xeon Platinum 8275L).
FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>`_ AWS instance (CPU Xeon Platinum 8275L).


Detection predictors
Expand Down Expand Up @@ -151,7 +151,7 @@ While most of our recognition models were trained on our french vocab (cf. :ref:

*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*

FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>` AWS instance (CPU Xeon Platinum 8275L).
FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>`_ AWS instance (CPU Xeon Platinum 8275L).


Recognition predictors
Expand Down Expand Up @@ -206,7 +206,7 @@ Explanations about the metrics being used are available in :ref:`metrics`.

*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*

FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed frames per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>` AWS instance (CPU Xeon Platinum 8275L).
FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed frames per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>`_ AWS instance (CPU Xeon Platinum 8275L).

Since you may be looking for specific use cases, we also performed this benchmark on private datasets with various document types below. Unfortunately, we are not able to share those at the moment since they contain sensitive information.

Expand Down Expand Up @@ -330,14 +330,18 @@ For reference, here is the JSON export for the same `Document` as above::
]
}

To export the outpout as XML (hocr-format) you can use the `export_as_xml` method::
To export the outpout as XML (hocr-format) you can use the `export_as_xml` method:

.. code-block:: python
xml_output = result.export_as_xml()
for output in xml_output:
xml_bytes_string = output[0]
xml_element = output[1]
For reference, here is a sample XML byte string output::
For reference, here is a sample XML byte string output:

.. code-block:: xml
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
Expand All @@ -360,3 +364,4 @@ For reference, here is a sample XML byte string output::
</div>
</body>
</html>

0 comments on commit 609cf4a

Please sign in to comment.