Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:Alignment during recontruction of image #1657

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions doctr/utils/reconstitution.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,18 @@ def synthesize_page(
# Draw each word
for block in page["blocks"]:
for line in block["lines"]:
line_ymin = min(int(round(h * word["geometry"][0][1])) for word in line["words"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest the following:

def synthesize_page(
    page: Dict[str, Any],
    draw_proba: bool = False,
    font_family: Optional[str] = None,
    adjust_to_line: bool = False,
) -> np.ndarray:
    """Draw a the content of the element page (OCR response) on a blank page.

    Args:
    ----
        page: exported Page object to represent
        draw_proba: if True, draw words in colors to represent confidence. Blue: p=1, red: p=0
        font_size: size of the font, default font = 13
        font_family: family of the font
        adjust_to_line: if True, adjust y coordinates to line geometry

    Returns:
    -------
        the synthesized page
    """
    # Draw template
    h, w = page["dimensions"]
    response = 255 * np.ones((h, w, 3), dtype=np.int32)

    # Draw each word
    for block in page["blocks"]:
        multiline = len(block["lines"]) > 1
        for line in block["lines"]:
            for word in line["words"]:
                # Get absolute word geometry
                (xmin, ymin), (xmax, ymax) = word["geometry"]
                xmin, xmax = int(round(w * xmin)), int(round(w * xmax))

                if multiline and adjust_to_line:
                    ymin = int(round(h * line["geometry"][0][1]))
                    ymax = int(round(h * line["geometry"][1][1]))
                else:
                    ymin, ymax = int(round(h * ymin)), int(round(h * ymax))

In this case the user can still decide and adjusting makes only sense if we have lines (so resolve_lines=True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will do that right away

line_ymax = max(int(round(h * word["geometry"][1][1])) for word in line["words"])
for word in line["words"]:
# Get absolute word geometry
(xmin, ymin), (xmax, ymax) = word["geometry"]
xmin, xmax = int(round(w * xmin)), int(round(w * xmax))
ymin, ymax = int(round(h * ymin)), int(round(h * ymax))

# White drawing context adapted to font size, 0.75 factor to convert pts --> pix
font = get_font(font_family, int(0.75 * (ymax - ymin)))
ymin, ymax = line_ymin, line_ymax
calculate_font_size = int(0.75 * (ymax - ymin))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does still not work well see:

  • does still overlap

2
Screenshot from 2024-06-25 09-29-07
test_page
Screenshot from 2024-06-25 09-30-29

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting. I shall debug this and see why the issue exists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there, what models did you use for these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey :)

fast_base and parseq and db_mobilenet_v3_large and crnn_mobilenet_v3_large

font_size = 0.009 if calculate_font_size <= 0.009 else calculate_font_size
font = get_font(font_family, font_size)
img = Image.new("RGB", (xmax - xmin, ymax - ymin), color=(255, 255, 255))
d = ImageDraw.Draw(img)
# Draw in black the value of the word
Expand Down Expand Up @@ -101,7 +105,9 @@ def synthesize_kie_page(
ymin, ymax = int(round(h * ymin)), int(round(h * ymax))

# White drawing context adapted to font size, 0.75 factor to convert pts --> pix
font = get_font(font_family, int(0.75 * (ymax - ymin)))
calculate_font_size = int(0.75 * (ymax - ymin))
font_size = 0.009 if calculate_font_size <= 0.009 else calculate_font_size
font = get_font(font_family, font_size)
img = Image.new("RGB", (xmax - xmin, ymax - ymin), color=(255, 255, 255))
d = ImageDraw.Draw(img)
# Draw in black the value of the word
Expand Down
Loading