Flipped text recognition prediction. #1455

decadance-dance · 2024-02-07T08:54:16Z

Bug description

When I set the option assume_straight_pages=False, some of the predictions may be turned upside down.
I tried db_resnet34, db_resnet50 and master, parseg. For each pair I observed this bug.

Code snippet to reproduce the bug

from doctr.models import ocr_predictor
from doctr.io import DocumentFile


input = DocumentFile.from_images("./gh.png")

model = ocr_predictor(
    'db_resnet50', 
    'parseq', 
    pretrained=True,
    assume_straight_pages=False,
).cuda().half()

result = model(input)
print(result)

Error traceback

...
Line(
  (words): [
              Word(value='ster', confidence=1.0),
              Word(value='and', confidence=1.0),
              Word(value='Graham', confidence=1.0),
              Word(value='6661]', confidence=0.95),         <-- Should be '[1999'
              Word(value='and', confidence=1.0),
              Word(value='2012],', confidence=1.0),
              Word(value='Gamba', confidence=1.0),
              Word(value='and', confidence=1.0),
              Word(value='Graham', confidence=1.0),
              Word(value='[2018]', confidence=1.0),
              Word(value='and', confidence=1.0),
              Word(value='Axelrod', confidence=1.0),
              Word(value='[2018).', confidence=0.99),
            ]
),
...

Environment

Collecting environment information...

DocTR version: 0.8.0a0
TensorFlow version: N/A
PyTorch version: 2.1.0a0+4136153 (torchvision 0.16.0a0)
OpenCV version: 4.9.0
OS: Ubuntu 22.04.2 LTS
Python version: 3.10.6
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): Yes
CUDA runtime version: 12.1.105
GPU models and configuration: GPU 0: NVIDIA A30
Nvidia driver version: 525.147.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.2

Deep Learning backend

is_tf_available: False
is_torch_available: True

The text was updated successfully, but these errors were encountered:

felixdittrich92 · 2024-02-07T09:11:23Z

Hi @decadance-dance 👋

Yeah this depends on the crop orientation classifier which isn't 100% robust atm.
We will retrain this after the next release a training script is already added :)

CC @odulcy-mindee

decadance-dance · 2024-02-07T09:36:07Z

@felixdittrich92, got you, thanks. BTW, maybe you know an easy way to workaround it in my case. My case is I want to get quads (4 pts) instead of rectangles (2 pts) as input of a detector, even if my page is straight.
That is, in a real scenario, I will receive straight documents and I don’t really need to get their orientation and rotate them, but I still need rectification to feed crops to the recognizer.

felixdittrich92 · 2024-02-07T09:57:34Z

Mh could you explain this a bit more in detail ? Because if your images contains only straight text the rectification should not be a problem !?

If we talk about some modifications from the detector output in the middle of the pipeline before it's passed to the recognition model -> #1449 could be a helpful solver (Note: input and output signature needs to be the same so conversion from rect to quad in the same pipeline will not work

decadance-dance · 2024-02-08T16:26:49Z

@felixdittrich92 All my documents are straight. So I could use assume_straight_pages = True, but in that case I would get rectangles (two points) as the detector output. But I need to get quads (four points) from the detector, so I use assume_straight_pages = False. But this option sometimes causes problems, such as those described in this issue.
So I'm looking for a way to get four points from detector and avoid the upside down crops.

nikhilanj · 2024-07-18T18:37:01Z

@felixdittrich92
Hi,
I'm facing similar issues with v0.8.1 when operating on text that is rotated upto +/- 45 degrees.
I see the issue mentions v0.9.0 and v0.10.0.
Is there a way I can test the new model/checkpoint ?
PR #1608 has a new TF checkpoint, but I'm using PyTorch

milosacimovic · 2024-09-06T07:23:15Z

Facing the same issue, if the degree of rotation is below 45 deg there is no real need for 90, 180, 270 corrections, while still wanting to use polygons as output of text detection.

decadance-dance added the type: bug Something isn't working label Feb 7, 2024

felixdittrich92 added framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: character classification Related to the task of character classification labels Feb 7, 2024

felixdittrich92 added this to the 0.9.0 milestone Feb 7, 2024

felixdittrich92 self-assigned this Feb 9, 2024

felixdittrich92 mentioned this issue Feb 9, 2024

Release tracker - v0.9.0 #1074

Closed

6 tasks

felixdittrich92 mentioned this issue Jun 6, 2024

Release tracker - v0.10.0 #1634

Closed

felixdittrich92 mentioned this issue Aug 8, 2024

Release tracker - v0.9.1 #1688

Closed

2 tasks

felixdittrich92 modified the milestones: 0.9.0, 0.9.1 Aug 8, 2024

milosacimovic mentioned this issue Sep 12, 2024

Feature/assume straight text #1723

Closed

felixdittrich92 linked a pull request Sep 27, 2024 that will close this issue

Feature/assume straight text #1723

Closed

felixdittrich92 removed a link to a pull request Sep 27, 2024

Feature/assume straight text #1723

Closed

felixdittrich92 linked a pull request Sep 27, 2024 that will close this issue

Disable page and crop orientation #1735

Merged

felixdittrich92 closed this as completed in #1735 Sep 27, 2024

felixdittrich92 modified the milestones: 0.9.1, 0.10.0 Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flipped text recognition prediction. #1455

Flipped text recognition prediction. #1455

decadance-dance commented Feb 7, 2024

felixdittrich92 commented Feb 7, 2024

decadance-dance commented Feb 7, 2024

felixdittrich92 commented Feb 7, 2024

decadance-dance commented Feb 8, 2024

nikhilanj commented Jul 18, 2024 •

edited

Loading

milosacimovic commented Sep 6, 2024

Flipped text recognition prediction. #1455

Flipped text recognition prediction. #1455

Comments

decadance-dance commented Feb 7, 2024

Bug description

Code snippet to reproduce the bug

Error traceback

Environment

Deep Learning backend

felixdittrich92 commented Feb 7, 2024

decadance-dance commented Feb 7, 2024

felixdittrich92 commented Feb 7, 2024

decadance-dance commented Feb 8, 2024

nikhilanj commented Jul 18, 2024 • edited Loading

milosacimovic commented Sep 6, 2024

nikhilanj commented Jul 18, 2024 •

edited

Loading