Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flipped text recognition prediction. #1455

Closed
Tracked by #1688
decadance-dance opened this issue Feb 7, 2024 · 6 comments · Fixed by #1735
Closed
Tracked by #1688

Flipped text recognition prediction. #1455

decadance-dance opened this issue Feb 7, 2024 · 6 comments · Fixed by #1735
Assignees
Labels
framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: character classification Related to the task of character classification type: bug Something isn't working
Milestone

Comments

@decadance-dance
Copy link

Bug description

When I set the option assume_straight_pages=False, some of the predictions may be turned upside down.
I tried db_resnet34, db_resnet50 and master, parseg. For each pair I observed this bug.

Code snippet to reproduce the bug

from doctr.models import ocr_predictor
from doctr.io import DocumentFile


input = DocumentFile.from_images("./gh.png")

model = ocr_predictor(
    'db_resnet50', 
    'parseq', 
    pretrained=True,
    assume_straight_pages=False,
).cuda().half()

result = model(input)
print(result)

Error traceback

...
Line(
  (words): [
              Word(value='ster', confidence=1.0),
              Word(value='and', confidence=1.0),
              Word(value='Graham', confidence=1.0),
              Word(value='6661]', confidence=0.95),         <-- Should be '[1999'
              Word(value='and', confidence=1.0),
              Word(value='2012],', confidence=1.0),
              Word(value='Gamba', confidence=1.0),
              Word(value='and', confidence=1.0),
              Word(value='Graham', confidence=1.0),
              Word(value='[2018]', confidence=1.0),
              Word(value='and', confidence=1.0),
              Word(value='Axelrod', confidence=1.0),
              Word(value='[2018).', confidence=0.99),
            ]
),
...

gh_mark

Environment

Collecting environment information...

DocTR version: 0.8.0a0
TensorFlow version: N/A
PyTorch version: 2.1.0a0+4136153 (torchvision 0.16.0a0)
OpenCV version: 4.9.0
OS: Ubuntu 22.04.2 LTS
Python version: 3.10.6
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): Yes
CUDA runtime version: 12.1.105
GPU models and configuration: GPU 0: NVIDIA A30
Nvidia driver version: 525.147.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.2

Deep Learning backend

is_tf_available: False
is_torch_available: True

@decadance-dance decadance-dance added the type: bug Something isn't working label Feb 7, 2024
@felixdittrich92
Copy link
Contributor

Hi @decadance-dance 👋

Yeah this depends on the crop orientation classifier which isn't 100% robust atm.
We will retrain this after the next release a training script is already added :)

CC @odulcy-mindee

@felixdittrich92 felixdittrich92 added framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: character classification Related to the task of character classification labels Feb 7, 2024
@felixdittrich92 felixdittrich92 added this to the 0.9.0 milestone Feb 7, 2024
@decadance-dance
Copy link
Author

@felixdittrich92, got you, thanks. BTW, maybe you know an easy way to workaround it in my case. My case is I want to get quads (4 pts) instead of rectangles (2 pts) as input of a detector, even if my page is straight.
That is, in a real scenario, I will receive straight documents and I don’t really need to get their orientation and rotate them, but I still need rectification to feed crops to the recognizer.

@felixdittrich92
Copy link
Contributor

Mh could you explain this a bit more in detail ? Because if your images contains only straight text the rectification should not be a problem !?

If we talk about some modifications from the detector output in the middle of the pipeline before it's passed to the recognition model -> #1449 could be a helpful solver (Note: input and output signature needs to be the same so conversion from rect to quad in the same pipeline will not work

@decadance-dance
Copy link
Author

@felixdittrich92 All my documents are straight. So I could use assume_straight_pages = True, but in that case I would get rectangles (two points) as the detector output. But I need to get quads (four points) from the detector, so I use assume_straight_pages = False. But this option sometimes causes problems, such as those described in this issue.
So I'm looking for a way to get four points from detector and avoid the upside down crops.

@nikhilanj
Copy link

nikhilanj commented Jul 18, 2024

@felixdittrich92
Hi,
I'm facing similar issues with v0.8.1 when operating on text that is rotated upto +/- 45 degrees.
I see the issue mentions v0.9.0 and v0.10.0.
Is there a way I can test the new model/checkpoint ?
PR #1608 has a new TF checkpoint, but I'm using PyTorch

@felixdittrich92 felixdittrich92 modified the milestones: 0.9.0, 0.9.1 Aug 8, 2024
@milosacimovic
Copy link
Contributor

Facing the same issue, if the degree of rotation is below 45 deg there is no real need for 90, 180, 270 corrections, while still wanting to use polygons as output of text detection.

@felixdittrich92 felixdittrich92 linked a pull request Sep 27, 2024 that will close this issue
@felixdittrich92 felixdittrich92 removed a link to a pull request Sep 27, 2024
@felixdittrich92 felixdittrich92 linked a pull request Sep 27, 2024 that will close this issue
@felixdittrich92 felixdittrich92 modified the milestones: 0.9.1, 0.10.0 Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: character classification Related to the task of character classification type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants