Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:Alignment during recontruction of image #1657

Closed
wants to merge 1 commit into from

Conversation

SkaarFacee
Copy link
Contributor

No description provided.

Copy link

codecov bot commented Jun 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.40%. Comparing base (1cea7d8) to head (b912996).
Report is 18 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1657      +/-   ##
==========================================
+ Coverage   96.35%   96.40%   +0.04%     
==========================================
  Files         164      164              
  Lines        7773     7780       +7     
==========================================
+ Hits         7490     7500      +10     
+ Misses        283      280       -3     
Flag Coverage Δ
unittests 96.40% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@felixdittrich92
Copy link
Contributor

Hi @SkaarFacee 👋,

Thanks for the PR.

@@ -38,14 +38,18 @@ def synthesize_page(
# Draw each word
for block in page["blocks"]:
for line in block["lines"]:
line_ymin = min(int(round(h * word["geometry"][0][1])) for word in line["words"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest the following:

def synthesize_page(
    page: Dict[str, Any],
    draw_proba: bool = False,
    font_family: Optional[str] = None,
    adjust_to_line: bool = False,
) -> np.ndarray:
    """Draw a the content of the element page (OCR response) on a blank page.

    Args:
    ----
        page: exported Page object to represent
        draw_proba: if True, draw words in colors to represent confidence. Blue: p=1, red: p=0
        font_size: size of the font, default font = 13
        font_family: family of the font
        adjust_to_line: if True, adjust y coordinates to line geometry

    Returns:
    -------
        the synthesized page
    """
    # Draw template
    h, w = page["dimensions"]
    response = 255 * np.ones((h, w, 3), dtype=np.int32)

    # Draw each word
    for block in page["blocks"]:
        multiline = len(block["lines"]) > 1
        for line in block["lines"]:
            for word in line["words"]:
                # Get absolute word geometry
                (xmin, ymin), (xmax, ymax) = word["geometry"]
                xmin, xmax = int(round(w * xmin)), int(round(w * xmax))

                if multiline and adjust_to_line:
                    ymin = int(round(h * line["geometry"][0][1]))
                    ymax = int(round(h * line["geometry"][1][1]))
                else:
                    ymin, ymax = int(round(h * ymin)), int(round(h * ymax))

In this case the user can still decide and adjusting makes only sense if we have lines (so resolve_lines=True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will do that right away

# White drawing context adapted to font size, 0.75 factor to convert pts --> pix
font = get_font(font_family, int(0.75 * (ymax - ymin)))
ymin, ymax = line_ymin, line_ymax
calculate_font_size = int(0.75 * (ymax - ymin))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does still not work well see:

  • does still overlap

2
Screenshot from 2024-06-25 09-29-07
test_page
Screenshot from 2024-06-25 09-30-29

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting. I shall debug this and see why the issue exists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there, what models did you use for these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey :)

fast_base and parseq and db_mobilenet_v3_large and crnn_mobilenet_v3_large

@felixdittrich92 felixdittrich92 linked an issue Jun 25, 2024 that may be closed by this pull request
@felixdittrich92 felixdittrich92 added this to the 0.10.0 milestone Jun 25, 2024
@felixdittrich92 felixdittrich92 added type: enhancement Improvement module: utils Related to doctr.utils labels Jun 25, 2024
@felixdittrich92 felixdittrich92 marked this pull request as draft June 25, 2024 07:34
@felixdittrich92
Copy link
Contributor

#1750

@felixdittrich92 felixdittrich92 removed this from the 0.10.0 milestone Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: utils Related to doctr.utils type: enhancement Improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[reconstitution] Improve synthesize output quality
2 participants