diff --git a/README.md b/README.md index 5a87f8cdaa..4fbcf53828 100644 --- a/README.md +++ b/README.md @@ -4,11 +4,10 @@ [![Slack Icon](https://img.shields.io/badge/Slack-Community-4A154B?style=flat-square&logo=slack&logoColor=white)](https://slack.mindee.com) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) ![Build Status](https://github.com/mindee/doctr/workflows/builds/badge.svg) [![codecov](https://codecov.io/gh/mindee/doctr/branch/main/graph/badge.svg?token=577MO567NM)](https://codecov.io/gh/mindee/doctr) [![CodeFactor](https://www.codefactor.io/repository/github/mindee/doctr/badge?s=bae07db86bb079ce9d6542315b8c6e70fa708a7e)](https://www.codefactor.io/repository/github/mindee/doctr) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/340a76749b634586a498e1c0ab998f08)](https://app.codacy.com/gh/mindee/doctr?utm_source=github.com&utm_medium=referral&utm_content=mindee/doctr&utm_campaign=Badge_Grade) [![Doc Status](https://github.com/mindee/doctr/workflows/doc-status/badge.svg)](https://mindee.github.io/doctr) [![Pypi](https://img.shields.io/badge/pypi-v0.6.0-blue.svg)](https://pypi.org/project/python-doctr/) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/mindee/doctr) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mindee/notebooks/blob/main/doctr/quicktour.ipynb) - **Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch** - What you can expect from this repository: + - efficient ways to parse textual information (localize and identify each word) from your documents - guidance on how to integrate this in your current architecture @@ -44,7 +43,9 @@ multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jp ``` ### Putting it together + Let's use the default pretrained model for an example: + ```python from doctr.io import DocumentFile from doctr.models import ocr_predictor @@ -57,6 +58,7 @@ result = model(doc) ``` ### Dealing with rotated documents + Should you use docTR on documents that include rotated pages, or pages with multiple box orientations, you have multiple options to handle it: @@ -69,7 +71,6 @@ will be converted to straight boxes), you need to pass `export_as_straight_boxes If both options are set to False, the predictor will always fit and return rotated boxes. - To interpret your model's predictions, you can visualize them interactively as follows: ```python @@ -89,7 +90,6 @@ plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show() ![Synthesis sample](https://github.com/mindee/doctr/releases/download/v0.3.1/synthesized_sample.png) - The `ocr_predictor` returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`). To get a better understanding of our document model, check our [documentation](https://mindee.github.io/doctr/modules/io.html#document-structure): @@ -100,6 +100,7 @@ json_output = result.export() ``` ### Use the KIE predictor + The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and adresses in a document. The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you. @@ -121,10 +122,11 @@ for class_name in predictions.keys(): for prediction in list_predictions: print(f"Prediction for {class_name}: {prediction}") ``` -The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class. +The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class. ### If you are looking for support from the Mindee team + [![Bad OCR test detection image asking the developer if they need help](https://github.com/mindee/doctr/releases/download/v0.5.1/doctr-need-help.png)](https://mindee.com/product/doctr) ## Installation @@ -136,6 +138,7 @@ Python 3.8 (or higher) and [pip](https://pip.pypa.io/en/stable/) are required to Since we use [weasyprint](https://weasyprint.readthedocs.io/), you will need extra dependencies if you are not running Linux. For MacOS users, you can install them as follows: + ```shell brew install cairo pango gdk-pixbuf libffi ``` @@ -149,6 +152,7 @@ You can then install the latest release of the package using [pypi](https://pypi ```shell pip install python-doctr ``` + > :warning: Please note that the basic installation is not standalone, as it does not provide a deep learning framework, which is required for the package to run. We try to keep framework-specific dependencies to a minimum. You can install framework-specific builds as follows: @@ -166,6 +170,7 @@ For MacBooks with M1 chip, you will need some additional packages or specific ve - PyTorch: [version >= 1.12.0](https://pytorch.org/get-started/locally/#start-locally) ### Developer mode + Alternatively, you can install it from source, which will require you to install [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git). First clone the project repository: @@ -175,6 +180,7 @@ pip install -e doctr/. ``` Again, if you prefer to avoid the risk of missing dependencies, you can install the TensorFlow or the PyTorch build: + ```shell # for TensorFlow pip install -e doctr/.[tf] @@ -182,15 +188,17 @@ pip install -e doctr/.[tf] pip install -e doctr/.[torch] ``` - ## Models architectures + Credits where it's due: this repository is implementing, among others, architectures from published research papers. ### Text Detection + - DBNet: [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf). - LinkNet: [LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation](https://arxiv.org/pdf/1707.03718.pdf) ### Text Recognition + - CRNN: [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/pdf/1507.05717.pdf). - SAR: [Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/pdf/1811.00751.pdf). - MASTER: [MASTER: Multi-Aspect Non-local Network for Scene Text Recognition](https://arxiv.org/pdf/1910.02562.pdf). @@ -203,7 +211,6 @@ Credits where it's due: this repository is implementing, among others, architect The full package documentation is available [here](https://mindee.github.io/doctr/) for detailed specifications. - ### Demo app A minimal demo app is provided for you to play with our end-to-end OCR models! @@ -220,9 +227,11 @@ Check it out [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%2 If you prefer to use it locally, there is an extra dependency ([Streamlit](https://streamlit.io/)) that is required. ##### Tensorflow version + ```shell pip install -r demo/tf-requirements.txt ``` + Then run your app in your default browser with: ```shell @@ -230,9 +239,11 @@ USE_TF=1 streamlit run demo/app.py ``` ##### PyTorch version + ```shell pip install -r demo/pt-requirements.txt ``` + Then run your app in your default browser with: ```shell @@ -246,7 +257,6 @@ Check out our [TensorFlow.js demo](https://github.com/mindee/doctr-tfjs-demo) to ![TFJS demo](https://github.com/mindee/doctr-tfjs-demo/releases/download/v0.1-models/demo_illustration_mini.png) - ### Docker container If you wish to deploy containerized environments, you can use the provided Dockerfile to build a docker image: @@ -262,21 +272,24 @@ An example script is provided for a simple documentation analysis of a PDF or im ```shell python scripts/analyze.py path/to/your/doc.pdf ``` -All script arguments can be checked using `python scripts/analyze.py --help` +All script arguments can be checked using `python scripts/analyze.py --help` ### Minimal API integration Looking to integrate docTR into your API? Here is a template to get you started with a fully working API using the wonderful [FastAPI](https://github.com/tiangolo/fastapi) framework. #### Deploy your API locally + Specific dependencies are required to run the API template, which you can install as follows: + ```shell cd api/ pip install poetry make lock pip install -r requirements.txt ``` + You can now run your API locally: ```shell @@ -284,6 +297,7 @@ uvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main: ``` Alternatively, you can run the same server on a docker container if you prefer using: + ```shell PORT=8002 docker-compose up -d --build ``` @@ -300,8 +314,8 @@ response = requests.post("http://localhost:8002/ocr", files={'file': data}).json ``` ### Example notebooks -Looking for more illustrations of docTR features? You might want to check the [Jupyter notebooks](https://github.com/mindee/doctr/tree/main/notebooks) designed to give you a broader overview. +Looking for more illustrations of docTR features? You might want to check the [Jupyter notebooks](https://github.com/mindee/doctr/tree/main/notebooks) designed to give you a broader overview. ## Citation @@ -317,14 +331,12 @@ If you wish to cite this project, feel free to use this [BibTeX](http://www.bibt } ``` - ## Contributing If you scrolled down to this section, you most likely appreciate open source. Do you feel like extending the range of our supported characters? Or perhaps submitting a paper implementation? Or contributing in any other way? You're in luck, we compiled a short guide (cf. [`CONTRIBUTING`](CONTRIBUTING.md)) for you to easily do so! - ## License Distributed under the Apache 2.0 License. See [`LICENSE`](LICENSE) for more information. diff --git a/api/README.md b/api/README.md index 502403ab8e..426e191bf2 100644 --- a/api/README.md +++ b/api/README.md @@ -9,17 +9,18 @@ You will only need to install [Git](https://git-scm.com/book/en/v2/Getting-Start ### Starting your web server You will need to clone the repository first, go into `api` folder and start the api: + ```shell git clone https://github.com/mindee/doctr.git cd doctr/api make run ``` + Once completed, your [FastAPI](https://fastapi.tiangolo.com/) server should be running on port 8080. ### Documentation and swagger -FastAPI comes with many advantages including speed and OpenAPI features. For instance, once your server is running, you can access the automatically built documentation and swagger in your browser at: http://localhost:8080/docs - +FastAPI comes with many advantages including speed and OpenAPI features. For instance, once your server is running, you can access the automatically built documentation and swagger in your browser at: [http://localhost:8080/docs](http://localhost:8080/docs) ### Using the routes @@ -40,12 +41,12 @@ print(requests.post("http://localhost:8080/detection", files={'file': data}).jso ``` should yield -``` + +```json [{'box': [0.826171875, 0.185546875, 0.90234375, 0.201171875]}, {'box': [0.75390625, 0.185546875, 0.8173828125, 0.201171875]}] ``` - #### Text recognition Using the following image: @@ -61,11 +62,11 @@ print(requests.post("http://localhost:8080/recognition", files={'file': data}).j ``` should yield -``` + +```json {'value': 'invite'} ``` - #### End-to-end OCR Using the following image: @@ -81,7 +82,8 @@ print(requests.post("http://localhost:8080/ocr", files={'file': data}).json()) ``` should yield -``` + +```json [{'box': [0.75390625, 0.185546875, 0.8173828125, 0.201171875], 'value': 'Hello'}, {'box': [0.826171875, 0.185546875, 0.90234375, 0.201171875], diff --git a/references/classification/README.md b/references/classification/README.md index 6f612aa60d..b802f63ba1 100644 --- a/references/classification/README.md +++ b/references/classification/README.md @@ -18,13 +18,13 @@ You can start your training in TensorFlow: ```shell python references/classification/train_tensorflow.py mobilenet_v3_large --epochs 5 ``` + or PyTorch: ```shell python references/classification/train_pytorch.py mobilenet_v3_large --epochs 5 --device 0 ``` - ## Advanced options Feel free to inspect the multiple script option to customize your training to your own needs! diff --git a/references/detection/README.md b/references/detection/README.md index e2ff24dfbc..7a07b4cb6b 100644 --- a/references/detection/README.md +++ b/references/detection/README.md @@ -18,6 +18,7 @@ You can start your training in TensorFlow: ```shell python references/detection/train_tensorflow.py path/to/your/train_set path/to/your/val_set db_resnet50 --epochs 5 ``` + or PyTorch: ```shell @@ -26,14 +27,14 @@ python references/detection/train_pytorch.py path/to/your/train_set path/to/your ## Data format -You need to provide both `train_path` and `val_path` arguments to start training. +You need to provide both `train_path` and `val_path` arguments to start training. Each path must lead to folder with 1 subfolder and 1 file: ```shell ├── images │ ├── sample_img_01.png │ ├── sample_img_02.png -│ ├── sample_img_03.png +│ ├── sample_img_03.png │ └── ... └── labels.json ``` @@ -42,6 +43,7 @@ Each JSON file must be a dictionary, where the keys are the image file names and The order of the points does not matter inside a polygon. Points are (x, y) absolutes coordinates. labels.json + ```shell { "sample_img_01.png" = { @@ -57,9 +59,11 @@ labels.json ... } ``` + If you want to train a model with multiple classes, you can use the following format where polygons is a dictionnary where each key represents one class and has all the polygons representing that class. labels.json + ```shell { "sample_img_01.png": { @@ -81,6 +85,7 @@ labels.json ... } ``` + ## Advanced options Feel free to inspect the multiple script option to customize your training to your own needs! diff --git a/references/recognition/README.md b/references/recognition/README.md index 7ae48ca78c..5fa551016d 100644 --- a/references/recognition/README.md +++ b/references/recognition/README.md @@ -18,6 +18,7 @@ You can start your training in TensorFlow: ```shell python references/recognition/train_tensorflow.py crnn_vgg16_bn --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5 ``` + or PyTorch: ```shell @@ -28,13 +29,16 @@ python references/recognition/train_pytorch.py crnn_vgg16_bn --train_path path/t Multi-GPU support on recognition task with PyTorch has been added. It'll be probably added for other tasks. Arguments are the same than the ones from single GPU, except: + - `--devices`: **by default, if you do not pass `--devices`, it will use all GPUs on your computer**. You can use specific GPUs by passing a list of ids (ex: `0 1 2`). To find them, you can use the following snippet: + ```python import torch devices = [torch.cuda.device(i) for i in range(torch.cuda.device_count())] device_names = [torch.cuda.get_device_name(d) for d in devices] ``` + - `--backend`: you can specify another `backend` for `DistribuedDataParallel` if the default one is not available on your operating system. Fastest one is `nccl` according to [PyTorch Documentation](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html). @@ -42,7 +46,6 @@ your operating system. Fastest one is `nccl` according to [PyTorch Documentation python references/recognition/train_pytorch_ddp.py crnn_vgg16_bn --train_path path/to/your/train_set --val_path path/to/your/val_set --epochs 5 --devices 0 1 --backend nccl ``` - ## Data format You need to provide both `train_path` and `val_path` arguments to start training. @@ -60,7 +63,7 @@ Each of these paths must lead to a 2-elements folder: The JSON files must contain word-labels for each picture as a string. The order of entries in the json does not matter. -```shell +```shell # labels.json { "img_1.jpg": "I", @@ -81,13 +84,16 @@ Feel free to inspect the multiple script option to customize your training to yo ```python python references/recognition/train_pytorch.py --help ``` + ## Using custom fonts + If you want to use your own custom fonts for training, make sure the font is installed on your OS. Do so on linux by copying the .ttf file to the desired directory with: ```sudo cp custom-font.ttf /usr/local/share/fonts/``` and then running ```fc-cache -f -v``` to build the font cache. Keep in mind that passing fonts to the training script will only work with the WordGenerator which will not augment or change images from the dataset if it is passed as argument. If no path to a dataset is passed like in this command ```python3 doctr/references/recognition/train_pytorch.py crnn_mobilenet_v3_small --vocab french --font "custom-font.ttf"``` only then is the WordGenerator "triggered" to create random images from the given vocab and font. Running the training script should look like this for multiple custom fonts: + ```shell python references/recognition/train_pytorch.py crnn_vgg16_bn --epochs 5 --font "custom-font-1.ttf,custom-font-2.ttf" ```