From f5349b87bb02e960278f44b05a76ed9bfe4b4877 Mon Sep 17 00:00:00 2001 From: felix Date: Thu, 29 Jun 2023 09:36:25 +0200 Subject: [PATCH 01/14] starting docs --- docs/source/using_doctr/using_models.rst | 204 ++++++++++++----------- 1 file changed, 106 insertions(+), 98 deletions(-) diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index 1a46c2bb79..2a8ded7e69 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -23,6 +23,8 @@ Available architectures The following architectures are currently supported: * :py:meth:`linknet_resnet18 ` +* :py:meth:`linknet_resnet18 ` +* :py:meth:`linknet_resnet18 ` * :py:meth:`db_resnet50 ` * :py:meth:`db_mobilenet_v3_large ` @@ -34,15 +36,31 @@ We also provide 2 models working with any kind of rotated documents: For a comprehensive comparison, we have compiled a detailed benchmark on publicly available datasets: -+------------------------------------------------------------------+----------------------------+----------------------------+---------+ -| | FUNSD | CORD | | -+=================================+=================+==============+============+===============+============+===============+=========+ -| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** | -+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| db_resnet50 | (1024, 1024, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| db_mobilenet_v3_large | (1024, 1024, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ ++-----------------------------------------------------------------------------------+----------------------------+----------------------------+---------+ +| | FUNSD | CORD | | ++================+=================================+=================+==============+============+===============+============+===============+=========+ +| **Backend** | **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| TensorFlow | db_resnet50 | (1024, 1024, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| Tensorflow | db_mobilenet_v3_large | (1024, 1024, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| TensorFlow | linknet_resnet18 | (1024, 1024, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| Tensorflow | linknet_resnet18_rotation | (1024, 1024, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| TensorFlow | linknet_resnet34 | (1024, 1024, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| Tensorflow | linknet_resnet50 | (1024, 1024, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | db_resnet34 | (1024, 1024, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | db_resnet50 | (1024, 1024, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | db_resnet50_rotation | (1024, 1024, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | db_mobilenet_v3_large | (1024, 1024, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`). @@ -50,7 +68,7 @@ Explanations about the metrics being used are available in :ref:`metrics`. *Disclaimer: both FUNSD subsets combined have 199 pages which might not be representative enough of the model capabilities* -FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large `_ AWS instance (CPU Xeon Platinum 8275L). +FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. Detection predictors @@ -58,11 +76,13 @@ Detection predictors :py:meth:`detection_predictor ` wraps your detection model to make it easily useable with your favorite deep learning framework seamlessly. - >>> import numpy as np - >>> from doctr.models import detection_predictor - >>> predictor = detection_predictor('db_resnet50') - >>> dummy_img = (255 * np.random.rand(800, 600, 3)).astype(np.uint8) - >>> out = model([dummy_img]) +.. code:: python3 + + import numpy as np + from doctr.models import detection_predictor + predictor = detection_predictor('db_resnet50') + dummy_img = (255 * np.random.rand(800, 600, 3)).astype(np.uint8) + out = model([dummy_img]) You can pass specific boolean arguments to the predictor: @@ -72,8 +92,10 @@ You can pass specific boolean arguments to the predictor: For instance, this snippet will instantiates a detection predictor able to detect text on rotated documents while preserving the aspect ratio: - >>> from doctr.models import detection_predictor - >>> predictor = detection_predictor('db_resnet50_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True) +.. code:: python3 + + from doctr.models import detection_predictor + predictor = detection_predictor('db_resnet50_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True) NB: for the moment, `db_resnet50_rotation` is pretrained in Pytorch only and `linknet_resnet18_rotation` in Tensorflow only. @@ -94,75 +116,82 @@ The following architectures are currently supported: * :py:meth:`crnn_mobilenet_v3_large ` * :py:meth:`sar_resnet31 ` * :py:meth:`master ` +* :py:meth:`vitstr_small ` +* :py:meth:`vitstr_base ` +* :py:meth:`parseq ` For a comprehensive comparison, we have compiled a detailed benchmark on publicly available datasets: -.. list-table:: Text recognition model zoo - :header-rows: 1 - - * - Architecture - - Input shape - - # params - - FUNSD - - CORD - - FPS - * - crnn_vgg16_bn - - (32, 128, 3) - - 15.8M - - 87.18 - - 92.93 - - 12.8 - * - crnn_mobilenet_v3_small - - (32, 128, 3) - - 2.1M - - 86.21 - - 90.56 - - - * - crnn_mobilenet_v3_large - - (32, 128, 3) - - 4.5M - - 86.95 - - 92.03 - - - * - sar_resnet31 - - (32, 128, 3) - - 56.2M - - **87.70** - - **93.41** - - 2.7 - * - master - - (32, 128, 3) - - 67.7M - - 87.62 - - 93.27 - - ++-----------------------------------------------------------------------------------+----------------------------+----------------------------+---------+ +| | FUNSD | CORD | | ++================+=================================+=================+==============+============+===============+============+===============+=========+ +| **Backend** | **Architecture** | **Input shape** | **# params** | **Exact** | **Partial** | **Exact** | **Partial** | **FPS** | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| TensorFlow | crnn_vgg16_bn | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| Tensorflow | crnn_mobilenet_v3_small | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| TensorFlow | crnn_mobilenet_v3_large | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| Tensorflow | master | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| TensorFlow | sar_resnet31 | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| Tensorflow | vitstr_small | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| TensorFlow | vitstr_base | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| TensorFlow | parseq | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | crnn_vgg16_bn | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | crnn_mobilenet_v3_small | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | crnn_mobilenet_v3_large | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | master | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | sar_resnet31 | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | vitstr_small | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | vitstr_base | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ +| PyTorch | parseq | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ + + All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`). Explanations about the metric being used (exact match) are available in :ref:`metrics`. While most of our recognition models were trained on our french vocab (cf. :ref:`vocabs`), you can easily access the vocab of any model as follows: - >>> from doctr.models import recognition_predictor - >>> predictor = recognition_predictor('crnn_vgg16_bn') - >>> print(predictor.model.cfg['vocab']) +.. code:: python3 + + from doctr.models import recognition_predictor + predictor = recognition_predictor('crnn_vgg16_bn') + print(predictor.model.cfg['vocab']) *Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities* -FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large `_ AWS instance (CPU Xeon Platinum 8275L). +FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. Recognition predictors ^^^^^^^^^^^^^^^^^^^^^^ :py:meth:`recognition_predictor ` wraps your recognition model to make it easily useable with your favorite deep learning framework seamlessly. - >>> import numpy as np - >>> from doctr.models import recognition_predictor - >>> predictor = recognition_predictor('crnn_vgg16_bn') - >>> dummy_img = (255 * np.random.rand(50, 150, 3)).astype(np.uint8) - >>> out = model([dummy_img]) +.. code:: python3 + + import numpy as np + from doctr.models import recognition_predictor + predictor = recognition_predictor('crnn_vgg16_bn') + dummy_img = (255 * np.random.rand(50, 150, 3)).astype(np.uint8) + out = model([dummy_img]) End-to-End OCR @@ -206,43 +235,20 @@ Explanations about the metrics being used are available in :ref:`metrics`. *Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities* -FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed frames per second over 1000 samples. Those results were obtained on a `c5.x12large `_ AWS instance (CPU Xeon Platinum 8275L). - -Since you may be looking for specific use cases, we also performed this benchmark on private datasets with various document types below. Unfortunately, we are not able to share those at the moment since they contain sensitive information. - - -+----------------------------------------------+----------------------------+----------------------------+----------------------------+----------------------------+----------------------------+----------------------------+ -| | Receipts | Invoices | IDs | US Tax Forms | Resumes | Road Fines | -+==============================================+============+===============+============+===============+============+===============+============+===============+============+===============+============+===============+ -| **Architecture** | **Recall** | **Precision** | **Recall** | **Precision** | **Recall** | **Precision** | **Recall** | **Precision** | **Recall** | **Precision** | **Recall** | **Precision** | -+----------------------------------------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+ -| db_resnet50 + crnn_vgg16_bn (ours) | 78.70 | 81.12 | 65.80 | 70.70 | 50.25 | 51.78 | 79.08 | 92.83 | | | | | -+----------------------------------------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+ -| db_resnet50 + master (ours) | **79.00** | **81.42** | 65.57 | 69.86 | 51.34 | 52.90 | 78.86 | 92.57 | | | | | -+----------------------------------------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+ -| db_resnet50 + sar_resnet31 (ours) | 78.94 | 81.37 | 65.89 | **70.79** | **51.78** | **53.35** | 79.04 | 92.78 | | | | | -+----------------------------------------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+ -| db_resnet50 + crnn_mobilenet_v3_small (ours) | 76.81 | 79.15 | 64.89 | 69.61 | 45.03 | 46.38 | 78.96 | 92.11 | 85.91 | 87.20 | 84.85 | 85.86 | -+----------------------------------------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+ -| db_resnet50 + crnn_mobilenet_v3_large (ours) | 78.01 | 80.39 | 65.36 | 70.11 | 48.00 | 49.43 | 79.39 | 92.62 | 87.68 | 89.00 | 85.65 | 86.67 | -+----------------------------------------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+ -| db_mobilenet_v3_large + crnn_vgg16_bn (ours) | 78.36 | 74.93 | 63.04 | 68.41 | 39.36 | 41.75 | 72.14 | 89.97 | | | | | -+----------------------------------------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+ -| Gvision doc. text detection | 68.91 | 59.89 | 63.20 | 52.85 | 43.70 | 29.21 | 69.79 | 65.68 | | | | | -+----------------------------------------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+ -| AWS textract | 75.77 | 77.70 | **70.47** | 69.13 | 46.39 | 43.32 | **84.31** | **98.11** | | | | | -+----------------------------------------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+------------+---------------+ +FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed frames per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. Two-stage approaches ^^^^^^^^^^^^^^^^^^^^ Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block. Everything is wrapped up with :py:meth:`ocr_predictor `. - >>> import numpy as np - >>> from doctr.models import ocr_predictor - >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True) - >>> input_page = (255 * np.random.rand(800, 600, 3)).astype(np.uint8) - >>> out = model([input_page]) +.. code:: python3 + + import numpy as np + from doctr.models import ocr_predictor + model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True) + input_page = (255 * np.random.rand(800, 600, 3)).astype(np.uint8) + out = model([input_page]) You can pass specific boolean arguments to the predictor: @@ -257,8 +263,10 @@ Those 3 are going straight to the detection predictor, as mentioned above (in th For instance, this snippet instantiates an end-to-end ocr_predictor working with rotated documents, which preserves the aspect ratio of the documents, and returns polygons: - >>> from doctr.model import ocr_predictor - >>> model = ocr_predictor('linknet_resnet18_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True) +.. code:: python3 + + from doctr.model import ocr_predictor + model = ocr_predictor('linknet_resnet18_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True) What should I do with the output? From 75d5c4f287652fd890ed9926de8064017ffe9c09 Mon Sep 17 00:00:00 2001 From: felix Date: Thu, 29 Jun 2023 11:52:45 +0200 Subject: [PATCH 02/14] update --- docs/source/using_doctr/using_models.rst | 56 +++++++++++++----------- 1 file changed, 31 insertions(+), 25 deletions(-) diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index 2a8ded7e69..23e8390f04 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -36,31 +36,37 @@ We also provide 2 models working with any kind of rotated documents: For a comprehensive comparison, we have compiled a detailed benchmark on publicly available datasets: -+-----------------------------------------------------------------------------------+----------------------------+----------------------------+---------+ -| | FUNSD | CORD | | -+================+=================================+=================+==============+============+===============+============+===============+=========+ -| **Backend** | **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| TensorFlow | db_resnet50 | (1024, 1024, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| Tensorflow | db_mobilenet_v3_large | (1024, 1024, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| TensorFlow | linknet_resnet18 | (1024, 1024, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| Tensorflow | linknet_resnet18_rotation | (1024, 1024, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| TensorFlow | linknet_resnet34 | (1024, 1024, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| Tensorflow | linknet_resnet50 | (1024, 1024, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | db_resnet34 | (1024, 1024, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | db_resnet50 | (1024, 1024, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | db_resnet50_rotation | (1024, 1024, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | db_mobilenet_v3_large | (1024, 1024, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ ++-----------------------------------------------------------------------------------+----------------------------+----------------------------+-------------+ +| | FUNSD | CORD | | ++================+=================================+=================+==============+============+===============+============+===============+=============+ +| **Backend** | **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **sec/it** | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| TensorFlow | db_resnet50 | (1024, 1024, 3) | 25.2 M | 81.22 | 86.66 | 92.46 | 89.62 | 1.2 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| Tensorflow | db_mobilenet_v3_large | (1024, 1024, 3) | 4.2 M | 78.27 | 82.77 | 80.99 | 66.57 | 0.5 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| TensorFlow | linknet_resnet18 | (1024, 1024, 3) | 11.5 M | 78.23 | 83.77 | 82.88 | 82.42 | 0.7 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| Tensorflow | linknet_resnet18_rotation | (1024, 1024, 3) | 11.5 M | 81.12 | 82.13 | 83.55 | 80.14 | 0.6 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| TensorFlow | linknet_resnet34 | (1024, 1024, 3) | 21.6 M | 82.14 | 87.64 | 85.55 | 86.02 | 0.8 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| Tensorflow | linknet_resnet50 | (1024, 1024, 3) | 28.8 M | 79.00 | 84.79 | 85.89 | 65.75 | 1.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| PyTorch | db_resnet34 | (1024, 1024, 3) | 22.4 M | | | | | 0.8 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| PyTorch | db_resnet50 | (1024, 1024, 3) | 25.4 M | 79.17 | 86.31 | 92.96 | 91.23 | 1.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| PyTorch | db_resnet50_rotation | (1024, 1024, 3) | 25.4 M | 83.30 | 91.07 | 91.63 | 90.53 | 1.6 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| PyTorch | db_mobilenet_v3_large | (1024, 1024, 3) | 4.2 M | 80.06 | 84.12 | 80.51 | 66.51 | 0.5 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| PyTorch | linknet_resnet18 | (1024, 1024, 3) | 11.5 M | | | | | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| PyTorch | linknet_resnet34 | (1024, 1024, 3) | 21.6 M | | | | | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ +| PyTorch | linknet_resnet50 | (1024, 1024, 3) | 28.8 M | | | | | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`). From 274edd91601ff3768df90b1d5ccdb47181c50670 Mon Sep 17 00:00:00 2001 From: felix Date: Thu, 29 Jun 2023 16:16:14 +0200 Subject: [PATCH 03/14] further updates --- docs/source/using_doctr/using_models.rst | 186 +++++++++++------------ 1 file changed, 87 insertions(+), 99 deletions(-) diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index 23e8390f04..fc94351634 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -23,8 +23,8 @@ Available architectures The following architectures are currently supported: * :py:meth:`linknet_resnet18 ` -* :py:meth:`linknet_resnet18 ` -* :py:meth:`linknet_resnet18 ` +* :py:meth:`linknet_resnet34 ` +* :py:meth:`linknet_resnet50 ` * :py:meth:`db_resnet50 ` * :py:meth:`db_mobilenet_v3_large ` @@ -36,37 +36,37 @@ We also provide 2 models working with any kind of rotated documents: For a comprehensive comparison, we have compiled a detailed benchmark on publicly available datasets: -+-----------------------------------------------------------------------------------+----------------------------+----------------------------+-------------+ -| | FUNSD | CORD | | -+================+=================================+=================+==============+============+===============+============+===============+=============+ -| **Backend** | **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **sec/it** | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| TensorFlow | db_resnet50 | (1024, 1024, 3) | 25.2 M | 81.22 | 86.66 | 92.46 | 89.62 | 1.2 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| Tensorflow | db_mobilenet_v3_large | (1024, 1024, 3) | 4.2 M | 78.27 | 82.77 | 80.99 | 66.57 | 0.5 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| TensorFlow | linknet_resnet18 | (1024, 1024, 3) | 11.5 M | 78.23 | 83.77 | 82.88 | 82.42 | 0.7 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| Tensorflow | linknet_resnet18_rotation | (1024, 1024, 3) | 11.5 M | 81.12 | 82.13 | 83.55 | 80.14 | 0.6 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| TensorFlow | linknet_resnet34 | (1024, 1024, 3) | 21.6 M | 82.14 | 87.64 | 85.55 | 86.02 | 0.8 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| Tensorflow | linknet_resnet50 | (1024, 1024, 3) | 28.8 M | 79.00 | 84.79 | 85.89 | 65.75 | 1.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| PyTorch | db_resnet34 | (1024, 1024, 3) | 22.4 M | | | | | 0.8 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| PyTorch | db_resnet50 | (1024, 1024, 3) | 25.4 M | 79.17 | 86.31 | 92.96 | 91.23 | 1.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| PyTorch | db_resnet50_rotation | (1024, 1024, 3) | 25.4 M | 83.30 | 91.07 | 91.63 | 90.53 | 1.6 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| PyTorch | db_mobilenet_v3_large | (1024, 1024, 3) | 4.2 M | 80.06 | 84.12 | 80.51 | 66.51 | 0.5 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| PyTorch | linknet_resnet18 | (1024, 1024, 3) | 11.5 M | | | | | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| PyTorch | linknet_resnet34 | (1024, 1024, 3) | 21.6 M | | | | | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ -| PyTorch | linknet_resnet50 | (1024, 1024, 3) | 28.8 M | | | | | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+-------------+ ++-----------------------------------------------------------------------------------+----------------------------+----------------------------+--------------------+ +| | FUNSD | CORD | | ++================+=================================+=================+==============+============+===============+============+===============+====================+ +| **Backend** | **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **sec/it (B: 1)** | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| TensorFlow | db_resnet50 | (1024, 1024, 3) | 25.2 M | 81.22 | 86.66 | 92.46 | 89.62 | 1.2 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| Tensorflow | db_mobilenet_v3_large | (1024, 1024, 3) | 4.2 M | 78.27 | 82.77 | 80.99 | 66.57 | 0.5 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| TensorFlow | linknet_resnet18 | (1024, 1024, 3) | 11.5 M | 78.23 | 83.77 | 82.88 | 82.42 | 0.7 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| Tensorflow | linknet_resnet18_rotation | (1024, 1024, 3) | 11.5 M | 81.12 | 82.13 | 83.55 | 80.14 | 0.6 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| TensorFlow | linknet_resnet34 | (1024, 1024, 3) | 21.6 M | 82.14 | 87.64 | 85.55 | 86.02 | 0.8 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| Tensorflow | linknet_resnet50 | (1024, 1024, 3) | 28.8 M | 79.00 | 84.79 | 85.89 | 65.75 | 1.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | db_resnet34 | (1024, 1024, 3) | 22.4 M | | | | | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | db_resnet50 | (1024, 1024, 3) | 25.4 M | 79.17 | 86.31 | 92.96 | 91.23 | 1.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | db_resnet50_rotation | (1024, 1024, 3) | 25.4 M | 83.30 | 91.07 | 91.63 | 90.53 | 1.6 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | db_mobilenet_v3_large | (1024, 1024, 3) | 4.2 M | 80.06 | 84.12 | 80.51 | 66.51 | 0.5 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | linknet_resnet18 | (1024, 1024, 3) | 11.5 M | | | | | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | linknet_resnet34 | (1024, 1024, 3) | 21.6 M | | | | | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | linknet_resnet50 | (1024, 1024, 3) | 28.8 M | | | | | | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`). @@ -74,7 +74,7 @@ Explanations about the metrics being used are available in :ref:`metrics`. *Disclaimer: both FUNSD subsets combined have 199 pages which might not be representative enough of the model capabilities* -FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. +Seconds per iteration is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. Detection predictors @@ -130,44 +130,43 @@ The following architectures are currently supported: For a comprehensive comparison, we have compiled a detailed benchmark on publicly available datasets: -+-----------------------------------------------------------------------------------+----------------------------+----------------------------+---------+ -| | FUNSD | CORD | | -+================+=================================+=================+==============+============+===============+============+===============+=========+ -| **Backend** | **Architecture** | **Input shape** | **# params** | **Exact** | **Partial** | **Exact** | **Partial** | **FPS** | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| TensorFlow | crnn_vgg16_bn | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| Tensorflow | crnn_mobilenet_v3_small | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| TensorFlow | crnn_mobilenet_v3_large | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| Tensorflow | master | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| TensorFlow | sar_resnet31 | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| Tensorflow | vitstr_small | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| TensorFlow | vitstr_base | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| TensorFlow | parseq | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | crnn_vgg16_bn | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | crnn_mobilenet_v3_small | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | crnn_mobilenet_v3_large | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | master | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | sar_resnet31 | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | vitstr_small | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | vitstr_base | (32, 128, 3) | 25.2 M | 82.14 | 87.64 | 92.49 | 89.66 | 2.1 | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ -| PyTorch | parseq | (32, 128, 3) | 4.2 M | 79.35 | 84.03 | 81.14 | 66.85 | | -+----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+ - ++-----------------------------------------------------------------------------------+----------------------------+----------------------------+--------------------+ +| | FUNSD | CORD | | ++================+=================================+=================+==============+============+===============+============+===============+====================+ +| **Backend** | **Architecture** | **Input shape** | **# params** | **Exact** | **Partial** | **Exact** | **Partial** | **sec/it (B: 64)** | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| TensorFlow | crnn_vgg16_bn | (32, 128, 3) | 15.8 M | | | | | 0.9 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| Tensorflow | crnn_mobilenet_v3_small | (32, 128, 3) | 2.1 M | | | | | 0.25 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| TensorFlow | crnn_mobilenet_v3_large | (32, 128, 3) | 4.5 M | | | | | 0.34 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| Tensorflow | master | (32, 128, 3) | 58.8 M | | | | | 22.3 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| TensorFlow | sar_resnet31 | (32, 128, 3) | 57.2 M | | | | | 7.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| Tensorflow | vitstr_small | (32, 128, 3) | 21.4 M | | | | | 2.0 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| TensorFlow | vitstr_base | (32, 128, 3) | 85.2 M | | | | | 5.8 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| TensorFlow | parseq | (32, 128, 3) | 23.8 M | | | | | 3.6 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | crnn_vgg16_bn | (32, 128, 3) | 15.8 M | | | | | 0.6 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | crnn_mobilenet_v3_small | (32, 128, 3) | 4.5 M | | | | | 0.05 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | crnn_mobilenet_v3_large | (32, 128, 3) | 2.1 M | | | | | 0.08 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | master | (32, 128, 3) | 58.7 M | | | | | 17.6 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | sar_resnet31 | (32, 128, 3) | 55.4 M | | | | | 4.9 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | vitstr_small | (32, 128, 3) | 21.4 M | | | | | 1.5 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | vitstr_base | (32, 128, 3) | 85.2 M | | | | | 4.1 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ +| PyTorch | parseq | (32, 128, 3) | 23.8 M | | | | | 2.2 | ++----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`). @@ -184,7 +183,7 @@ While most of our recognition models were trained on our french vocab (cf. :ref: *Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities* -FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. +Seconds per iteration (with a batch size of 64) is computed after a warmup phase of 100 tensors, by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. Recognition predictors @@ -208,41 +207,30 @@ The task consists of both localizing and transcribing textual elements in a give Available architectures ^^^^^^^^^^^^^^^^^^^^^^^ -You can use any combination of detection and recognition models supporte by docTR. +You can use any combination of detection and recognition models supported by docTR. For a comprehensive comparison, we have compiled a detailed benchmark on publicly available datasets: -+----------------------------------------+--------------------------------------+--------------------------------------+ -| | FUNSD | CORD | -+========================================+============+===============+=========+============+===============+=========+ -| **Architecture** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** | -+----------------------------------------+------------+---------------+---------+------------+---------------+---------+ -| db_resnet50 + crnn_vgg16_bn | 71.25 | 76.02 | 0.85 | 84.00 | 81.42 | 1.6 | -+----------------------------------------+------------+---------------+---------+------------+---------------+---------+ -| db_resnet50 + master | 71.03 | 76.06 | | 84.49 | 81.94 | | -+----------------------------------------+------------+---------------+---------+------------+---------------+---------+ -| db_resnet50 + sar_resnet31 | 71.25 | 76.29 | 0.27 | 84.50 | **81.96** | 0.83 | -+----------------------------------------+------------+---------------+---------+------------+---------------+---------+ -| db_resnet50 + crnn_mobilenet_v3_small | 69.85 | 74.80 | | 80.85 | 78.42 | 0.83 | -+----------------------------------------+------------+---------------+---------+------------+---------------+---------+ -| db_resnet50 + crnn_mobilenet_v3_large | 70.57 | 75.57 | | 82.57 | 80.08 | 0.83 | -+----------------------------------------+------------+---------------+---------+------------+---------------+---------+ -| db_mobilenet_v3_large + crnn_vgg16_bn | 67.73 | 71.73 | | 71.65 | 59.03 | | -+----------------------------------------+------------+---------------+---------+------------+---------------+---------+ -| Gvision text detection | 59.50 | 62.50 | | 75.30 | 70.00 | | -+----------------------------------------+------------+---------------+---------+------------+---------------+---------+ -| Gvision doc. text detection | 64.00 | 53.30 | | 68.90 | 61.10 | | -+----------------------------------------+------------+---------------+---------+------------+---------------+---------+ -| AWS textract | **78.10** | **83.00** | | **87.50** | 66.00 | | -+----------------------------------------+------------+---------------+---------+------------+---------------+---------+ ++--------------------------------------------------+----------------------------+----------------------------+ +| | FUNSD | CORD | ++================+=================================+============================+============+===============+ +| **Backend** | **Architecture** | **Recall** | **Precision** | **Recall** | **Precision** | ++----------------+---------------------------------+------------+---------------+------------+---------------+ +| TensorFlow | db_resnet50 | 81.22 | 86.66 | 92.46 | 89.62 | ++----------------+---------------------------------+------------+---------------+------------+---------------+ +| None | Gvision text detection | 59.50 | 62.50 | 75.30 | 59.03 | ++----------------+---------------------------------+------------+---------------+------------+---------------+ +| None | Gvision doc. text detection | 64.00 | 53.30 | 68.90 | 61.10 | ++----------------+---------------------------------+------------+---------------+------------+---------------+ +| None | AWS textract | 78.10 | 83.00 | 87.50 | 66.00 | ++----------------+---------------------------------+------------+---------------+------------+---------------+ + All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`). Explanations about the metrics being used are available in :ref:`metrics`. *Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities* -FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed frames per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. - Two-stage approaches ^^^^^^^^^^^^^^^^^^^^ From 02d6b637e46400a8ab003f33444216386f2e9a6a Mon Sep 17 00:00:00 2001 From: felix Date: Fri, 30 Jun 2023 12:15:02 +0200 Subject: [PATCH 04/14] updates --- docs/source/using_doctr/using_models.rst | 44 +++++++++++++----------- 1 file changed, 23 insertions(+), 21 deletions(-) diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index fc94351634..aa4d093bb2 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -74,7 +74,7 @@ Explanations about the metrics being used are available in :ref:`metrics`. *Disclaimer: both FUNSD subsets combined have 199 pages which might not be representative enough of the model capabilities* -Seconds per iteration is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. +Seconds per iteration (with a batch size of 1) is computed after a warmup phase of 100 tensors, by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. Detection predictors @@ -133,11 +133,11 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +-----------------------------------------------------------------------------------+----------------------------+----------------------------+--------------------+ | | FUNSD | CORD | | +================+=================================+=================+==============+============+===============+============+===============+====================+ -| **Backend** | **Architecture** | **Input shape** | **# params** | **Exact** | **Partial** | **Exact** | **Partial** | **sec/it (B: 64)** | +| **Backend** | **Architecture** | **Input shape** | **# params** | **Exact** | **Partial** | **Exact** | **Partial** | **sec/it (B: 1)** | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| TensorFlow | crnn_vgg16_bn | (32, 128, 3) | 15.8 M | | | | | 0.9 | +| TensorFlow | crnn_vgg16_bn | (32, 128, 3) | 15.8 M | 88.12 | 88.85 | 94.68 | 95.10 | 0.9 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| Tensorflow | crnn_mobilenet_v3_small | (32, 128, 3) | 2.1 M | | | | | 0.25 | +| Tensorflow | crnn_mobilenet_v3_small | (32, 128, 3) | 2.1 M | 86.88 | 87.61 | 92.28 | 92.73 | 0.25 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | TensorFlow | crnn_mobilenet_v3_large | (32, 128, 3) | 4.5 M | | | | | 0.34 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ @@ -151,11 +151,11 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | TensorFlow | parseq | (32, 128, 3) | 23.8 M | | | | | 3.6 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| PyTorch | crnn_vgg16_bn | (32, 128, 3) | 15.8 M | | | | | 0.6 | +| PyTorch | crnn_vgg16_bn | (32, 128, 3) | 15.8 M | 86.54 | 87.41 | 94.29 | 94.69 | 0.6 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| PyTorch | crnn_mobilenet_v3_small | (32, 128, 3) | 4.5 M | | | | | 0.05 | +| PyTorch | crnn_mobilenet_v3_small | (32, 128, 3) | 4.5 M | 87.25 | 87.99 | 93.91 | 94.34 | 0.05 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| PyTorch | crnn_mobilenet_v3_large | (32, 128, 3) | 2.1 M | | | | | 0.08 | +| PyTorch | crnn_mobilenet_v3_large | (32, 128, 3) | 2.1 M | 87.38 | 88.09 | 94.46 | 94.92 | 0.08 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | PyTorch | master | (32, 128, 3) | 58.7 M | | | | | 17.6 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ @@ -183,7 +183,7 @@ While most of our recognition models were trained on our french vocab (cf. :ref: *Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities* -Seconds per iteration (with a batch size of 64) is computed after a warmup phase of 100 tensors, by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. +Seconds per iteration (with a batch size of 1) is computed after a warmup phase of 100 tensors, by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. Recognition predictors @@ -211,19 +211,21 @@ You can use any combination of detection and recognition models supported by doc For a comprehensive comparison, we have compiled a detailed benchmark on publicly available datasets: -+--------------------------------------------------+----------------------------+----------------------------+ -| | FUNSD | CORD | -+================+=================================+============================+============+===============+ -| **Backend** | **Architecture** | **Recall** | **Precision** | **Recall** | **Precision** | -+----------------+---------------------------------+------------+---------------+------------+---------------+ -| TensorFlow | db_resnet50 | 81.22 | 86.66 | 92.46 | 89.62 | -+----------------+---------------------------------+------------+---------------+------------+---------------+ -| None | Gvision text detection | 59.50 | 62.50 | 75.30 | 59.03 | -+----------------+---------------------------------+------------+---------------+------------+---------------+ -| None | Gvision doc. text detection | 64.00 | 53.30 | 68.90 | 61.10 | -+----------------+---------------------------------+------------+---------------+------------+---------------+ -| None | AWS textract | 78.10 | 83.00 | 87.50 | 66.00 | -+----------------+---------------------------------+------------+---------------+------------+---------------+ ++---------------------------------------------------------------------------+----------------------------+----------------------------+ +| | FUNSD | CORD | ++================+==========================================================+============================+============+===============+ +| **Backend** | **Architecture** | **Recall** | **Precision** | **Recall** | **Precision** | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| TensorFlow | db_resnet50 + crnn_vgg16_bn | 70.82 | 75.56 | 83.97 | 81.40 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| PyTorch | db_resnet50 + crnn_vgg16_bn | 67.82 | 73.35 | 84.84 | 83.27 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| None | Gvision text detection | 59.50 | 62.50 | 75.30 | 59.03 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| None | Gvision doc. text detection | 64.00 | 53.30 | 68.90 | 61.10 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| None | AWS textract | 78.10 | 83.00 | 87.50 | 66.00 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`). From 8f13762d7914687fa0a6e241e8fe8d885b889c45 Mon Sep 17 00:00:00 2001 From: felix Date: Tue, 11 Jul 2023 10:02:05 +0200 Subject: [PATCH 05/14] rebase From d5d42db9d70197bb4c47102d7a323b6ccf2d63ba Mon Sep 17 00:00:00 2001 From: felix Date: Tue, 11 Jul 2023 14:27:26 +0200 Subject: [PATCH 06/14] add master tf report --- docs/source/using_doctr/using_models.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index aa4d093bb2..10ce23a9f8 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -133,7 +133,7 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +-----------------------------------------------------------------------------------+----------------------------+----------------------------+--------------------+ | | FUNSD | CORD | | +================+=================================+=================+==============+============+===============+============+===============+====================+ -| **Backend** | **Architecture** | **Input shape** | **# params** | **Exact** | **Partial** | **Exact** | **Partial** | **sec/it (B: 1)** | +| **Backend** | **Architecture** | **Input shape** | **# params** | **Exact** | **Partial** | **Exact** | **Partial** | **sec/it (B: 64)** | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | TensorFlow | crnn_vgg16_bn | (32, 128, 3) | 15.8 M | 88.12 | 88.85 | 94.68 | 95.10 | 0.9 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ @@ -141,7 +141,7 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | TensorFlow | crnn_mobilenet_v3_large | (32, 128, 3) | 4.5 M | | | | | 0.34 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| Tensorflow | master | (32, 128, 3) | 58.8 M | | | | | 22.3 | +| Tensorflow | master | (32, 128, 3) | 58.8 M | 87.44 | 88.21 | 93.83 | 94.25 | 22.3 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | TensorFlow | sar_resnet31 | (32, 128, 3) | 57.2 M | | | | | 7.1 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ @@ -183,7 +183,7 @@ While most of our recognition models were trained on our french vocab (cf. :ref: *Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities* -Seconds per iteration (with a batch size of 1) is computed after a warmup phase of 100 tensors, by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. +Seconds per iteration (with a batch size of 64) is computed after a warmup phase of 100 tensors, by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz`. Recognition predictors From c2f0f5f24d90e496d0d661ec40999566801f7f99 Mon Sep 17 00:00:00 2001 From: felix Date: Mon, 24 Jul 2023 09:17:38 +0200 Subject: [PATCH 07/14] rebase From 98752b65f61c8b87fdff0053bdd7b226e1746fe2 Mon Sep 17 00:00:00 2001 From: felix Date: Fri, 28 Jul 2023 13:32:39 +0200 Subject: [PATCH 08/14] eval tf vitstr-small --- docs/source/using_doctr/using_models.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index 10ce23a9f8..d2c44fd3fa 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -145,7 +145,7 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | TensorFlow | sar_resnet31 | (32, 128, 3) | 57.2 M | | | | | 7.1 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| Tensorflow | vitstr_small | (32, 128, 3) | 21.4 M | | | | | 2.0 | +| Tensorflow | vitstr_small | (32, 128, 3) | 21.4 M | 83.01 | 83.84 | 86.57 | 87.00 | 2.0 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | TensorFlow | vitstr_base | (32, 128, 3) | 85.2 M | | | | | 5.8 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ @@ -218,6 +218,8 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | TensorFlow | db_resnet50 + crnn_vgg16_bn | 70.82 | 75.56 | 83.97 | 81.40 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| TensorFlow | db_resnet50 + vitstr_small | 64.58 | 68.91 | 74.66 | 72.37 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | PyTorch | db_resnet50 + crnn_vgg16_bn | 67.82 | 73.35 | 84.84 | 83.27 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | None | Gvision text detection | 59.50 | 62.50 | 75.30 | 59.03 | @@ -368,4 +370,3 @@ For reference, here is a sample XML byte string output: - From b7a71e6b1f8dece9766610720fe1a8a19bbac016 Mon Sep 17 00:00:00 2001 From: felix Date: Tue, 15 Aug 2023 16:08:17 +0200 Subject: [PATCH 09/14] add further benchmarks --- docs/source/using_doctr/using_models.rst | 32 +++++++++++++++++++++--- scripts/evaluate.py | 1 + scripts/evaluate_kie.py | 1 + 3 files changed, 31 insertions(+), 3 deletions(-) diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index d2c44fd3fa..208a973a06 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -143,7 +143,7 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | Tensorflow | master | (32, 128, 3) | 58.8 M | 87.44 | 88.21 | 93.83 | 94.25 | 22.3 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| TensorFlow | sar_resnet31 | (32, 128, 3) | 57.2 M | | | | | 7.1 | +| TensorFlow | sar_resnet31 | (32, 128, 3) | 57.2 M | 87.67 | 88.48 | 94.21 | 94.66 | 7.1 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | Tensorflow | vitstr_small | (32, 128, 3) | 21.4 M | 83.01 | 83.84 | 86.57 | 87.00 | 2.0 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ @@ -153,9 +153,9 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | PyTorch | crnn_vgg16_bn | (32, 128, 3) | 15.8 M | 86.54 | 87.41 | 94.29 | 94.69 | 0.6 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| PyTorch | crnn_mobilenet_v3_small | (32, 128, 3) | 4.5 M | 87.25 | 87.99 | 93.91 | 94.34 | 0.05 | +| PyTorch | crnn_mobilenet_v3_small | (32, 128, 3) | 2.1 M | 87.25 | 87.99 | 93.91 | 94.34 | 0.05 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| PyTorch | crnn_mobilenet_v3_large | (32, 128, 3) | 2.1 M | 87.38 | 88.09 | 94.46 | 94.92 | 0.08 | +| PyTorch | crnn_mobilenet_v3_large | (32, 128, 3) | 4.5 M | 87.38 | 88.09 | 94.46 | 94.92 | 0.08 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | PyTorch | master | (32, 128, 3) | 58.7 M | | | | | 17.6 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ @@ -218,10 +218,36 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | TensorFlow | db_resnet50 + crnn_vgg16_bn | 70.82 | 75.56 | 83.97 | 81.40 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| TensorFlow | db_resnet50 + crnn_mobilenet_v3_small | 69.63 | 74.29 | 81.08 | 78.59 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| TensorFlow | db_resnet50 + crnn_mobilenet_v3_large | | | | | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| TensorFlow | db_resnet50 + sar_resnet31 | 69.42 | 74.04 | 80.67 | 78.21 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| TensorFlow | db_resnet50 + master | 68.75 | 73.76 | 78.56 | 76.24 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | TensorFlow | db_resnet50 + vitstr_small | 64.58 | 68.91 | 74.66 | 72.37 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| TensorFlow | db_resnet50 + vitstr_base | | | | | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| TensorFlow | db_resnet50 + parseq | | | | | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | PyTorch | db_resnet50 + crnn_vgg16_bn | 67.82 | 73.35 | 84.84 | 83.27 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| PyTorch | db_resnet50 + crnn_mobilenet_v3_small | 67.89 | 74.01 | 84.43 | 82.85 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| PyTorch | db_resnet50 + crnn_mobilenet_v3_large | 68.45 | 74.63 | 84.86 | 83.27 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| PyTorch | db_resnet50 + sar_resnet31 | | | | | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| PyTorch | db_resnet50 + master | | | | | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| PyTorch | db_resnet50 + vitstr_small | | | | | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| PyTorch | db_resnet50 + vitstr_base | | | | | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| PyTorch | db_resnet50 + parseq | | | | | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | None | Gvision text detection | 59.50 | 62.50 | 75.30 | 59.03 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | None | Gvision doc. text detection | 64.00 | 53.30 | 68.90 | 61.10 | diff --git a/scripts/evaluate.py b/scripts/evaluate.py index f4e8aaefea..20da633bdf 100644 --- a/scripts/evaluate.py +++ b/scripts/evaluate.py @@ -40,6 +40,7 @@ def main(args): args.recognition, pretrained=True, reco_bs=args.batch_size, + preserve_aspect_ratio=False, assume_straight_pages=not args.rotation, ) diff --git a/scripts/evaluate_kie.py b/scripts/evaluate_kie.py index 1aaf3f9ae8..3d16197d98 100644 --- a/scripts/evaluate_kie.py +++ b/scripts/evaluate_kie.py @@ -42,6 +42,7 @@ def main(args): args.recognition, pretrained=True, reco_bs=args.batch_size, + preserve_aspect_ratio=False, assume_straight_pages=not args.rotation, ) From 315f03750081b0a62cf74d7dfeca6e44f4fc7d2b Mon Sep 17 00:00:00 2001 From: felix Date: Fri, 25 Aug 2023 09:58:31 +0200 Subject: [PATCH 10/14] add Azure form recognizer benchmark --- docs/source/using_doctr/using_models.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index 208a973a06..bc3ec04066 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -254,6 +254,8 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | None | AWS textract | 78.10 | 83.00 | 87.50 | 66.00 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ +| None | Azure Form Recognizer (v3.2) | 79.42 | 85.89 | 89.62 | 88.93 | ++----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`). From adbe09ff412f59a64080c1b6ee1bc7654f713af9 Mon Sep 17 00:00:00 2001 From: felix Date: Mon, 28 Aug 2023 16:28:44 +0200 Subject: [PATCH 11/14] add crnn_mobilenet_v3_large benchmark --- docs/source/using_doctr/using_models.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index bc3ec04066..41e54b77f6 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -139,7 +139,7 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | Tensorflow | crnn_mobilenet_v3_small | (32, 128, 3) | 2.1 M | 86.88 | 87.61 | 92.28 | 92.73 | 0.25 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| TensorFlow | crnn_mobilenet_v3_large | (32, 128, 3) | 4.5 M | | | | | 0.34 | +| TensorFlow | crnn_mobilenet_v3_large | (32, 128, 3) | 4.5 M | 87.44 | 88.12 | 94.14 | 94.55 | 0.34 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | Tensorflow | master | (32, 128, 3) | 58.8 M | 87.44 | 88.21 | 93.83 | 94.25 | 22.3 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ @@ -220,7 +220,7 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | TensorFlow | db_resnet50 + crnn_mobilenet_v3_small | 69.63 | 74.29 | 81.08 | 78.59 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ -| TensorFlow | db_resnet50 + crnn_mobilenet_v3_large | | | | | +| TensorFlow | db_resnet50 + crnn_mobilenet_v3_large | 70.01 | 74.70 | 83.28 | 80.73 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | TensorFlow | db_resnet50 + sar_resnet31 | 69.42 | 74.04 | 80.67 | 78.21 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ From 1b45dbde2af301413e1aece25cab649c3cb98519 Mon Sep 17 00:00:00 2001 From: felix Date: Wed, 6 Sep 2023 09:57:10 +0200 Subject: [PATCH 12/14] missing dep in tf build corrected from dev --- pyproject.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyproject.toml b/pyproject.toml index 044fe0f833..025807c3d4 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -57,7 +57,7 @@ dependencies = [ [project.optional-dependencies] tf = [ "tensorflow>=2.11.0,<3.0.0", # cf. https://github.com/mindee/doctr/pull/1182 - "tf2onnx>=1.14.0,<2.0.0", + "tf2onnx>=1.15.1,<2.0.0", # cf.https://github.com/onnx/tensorflow-onnx/releases/tag/v1.15.1 ] torch = [ "torch>=1.12.0,<3.0.0", From 10e6a56f280a5ae91bc47d21464fec3038d537ab Mon Sep 17 00:00:00 2001 From: felix Date: Fri, 8 Sep 2023 12:05:38 +0200 Subject: [PATCH 13/14] update --- docs/source/using_doctr/using_models.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index 41e54b77f6..cd564c7714 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -30,8 +30,8 @@ The following architectures are currently supported: We also provide 2 models working with any kind of rotated documents: -* :py:meth:`linknet_resnet18_rotation ` -* :py:meth:`db_resnet50_rotation ` +* :py:meth:`linknet_resnet18_rotation ` (TensorFlow) +* :py:meth:`db_resnet50_rotation ` (PyTorch) For a comprehensive comparison, we have compiled a detailed benchmark on publicly available datasets: @@ -222,7 +222,7 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | TensorFlow | db_resnet50 + crnn_mobilenet_v3_large | 70.01 | 74.70 | 83.28 | 80.73 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ -| TensorFlow | db_resnet50 + sar_resnet31 | 69.42 | 74.04 | 80.67 | 78.21 | +| TensorFlow | db_resnet50 + sar_resnet31 | 68.75 | 73.76 | 78.56 | 76.24 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | TensorFlow | db_resnet50 + master | 68.75 | 73.76 | 78.56 | 76.24 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ From 2d7ed4db3ae74c332102f50442db5a0b2f556112 Mon Sep 17 00:00:00 2001 From: felix Date: Fri, 8 Sep 2023 16:06:55 +0200 Subject: [PATCH 14/14] add last tf benchmarks --- docs/source/using_doctr/using_models.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/using_doctr/using_models.rst b/docs/source/using_doctr/using_models.rst index cd564c7714..007f8b2955 100644 --- a/docs/source/using_doctr/using_models.rst +++ b/docs/source/using_doctr/using_models.rst @@ -147,9 +147,9 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | Tensorflow | vitstr_small | (32, 128, 3) | 21.4 M | 83.01 | 83.84 | 86.57 | 87.00 | 2.0 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| TensorFlow | vitstr_base | (32, 128, 3) | 85.2 M | | | | | 5.8 | +| TensorFlow | vitstr_base | (32, 128, 3) | 85.2 M | 85.98 | 86.70 | 90.47 | 90.95 | 5.8 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ -| TensorFlow | parseq | (32, 128, 3) | 23.8 M | | | | | 3.6 | +| TensorFlow | parseq | (32, 128, 3) | 23.8 M | 81.62 | 82.29 | 79.13 | 79.52 | 3.6 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ | PyTorch | crnn_vgg16_bn | (32, 128, 3) | 15.8 M | 86.54 | 87.41 | 94.29 | 94.69 | 0.6 | +----------------+---------------------------------+-----------------+--------------+------------+---------------+------------+---------------+--------------------+ @@ -228,9 +228,9 @@ For a comprehensive comparison, we have compiled a detailed benchmark on publicl +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | TensorFlow | db_resnet50 + vitstr_small | 64.58 | 68.91 | 74.66 | 72.37 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ -| TensorFlow | db_resnet50 + vitstr_base | | | | | +| TensorFlow | db_resnet50 + vitstr_base | 66.89 | 71.37 | 79.11 | 76.68 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ -| TensorFlow | db_resnet50 + parseq | | | | | +| TensorFlow | db_resnet50 + parseq | 65.77 | 70.18 | 71.57 | 69.37 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+ | PyTorch | db_resnet50 + crnn_vgg16_bn | 67.82 | 73.35 | 84.84 | 83.27 | +----------------+----------------------------------------------------------+------------+---------------+------------+---------------+