Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude deps & split into optional parts #1551

Merged
merged 13 commits into from
Apr 23, 2024

Conversation

felixdittrich92
Copy link
Contributor

@felixdittrich92 felixdittrich92 commented Apr 14, 2024

This PR:

  • exclude weasyprint / matplotlib & mplcursors from base package -> optional-deps
  • split synth and visualization into different modules for easier handling

Any feedback is welcome 🤗

Closes: #1052 #1540 #1545

@felixdittrich92 felixdittrich92 changed the title Optional dep split [Draft] Exclude deps & split into optional parts Apr 14, 2024
@felixdittrich92 felixdittrich92 self-assigned this Apr 14, 2024
@felixdittrich92 felixdittrich92 added this to the 0.9.0 milestone Apr 14, 2024
@felixdittrich92 felixdittrich92 added topic: documentation Improvements or additions to documentation topic: build Related to dependencies and build topic: ci Related to CI module: utils Related to doctr.utils os: windows Involving Windows users os: mac Involving Apple MAC users type: misc Miscellaneous ext: docs Related to docs folder labels Apr 14, 2024
doctr/file_utils.py Outdated Show resolved Hide resolved
@@ -23,7 +23,7 @@ jobs:
matrix:
# Must match version at https://www.python.org/ftp/python/
python: ["3.9.18", "3.10.13", "3.11.8"]
framework: ["tf", "torch"]
framework: ["tf", "torch", "tf-extras", "torch-extras"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frgfm @odulcy-mindee Not sure do we want this as "prod" "non prod" (with -extras) !? 😅

pyproject.toml Outdated Show resolved Hide resolved
@felixdittrich92 felixdittrich92 marked this pull request as ready for review April 15, 2024 09:01
Copy link
Collaborator

@frgfm frgfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I haven't reviewed all files, only the extra def part, which I think we should handle before the rest. Probably doing separate orthogonal extras, and then figuring out a good logic to combine those or add proper installation instructions

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@felixdittrich92
Copy link
Contributor Author

Thanks for the PR! I haven't reviewed all files, only the extra def part, which I think we should handle before the rest. Probably doing separate orthogonal extras, and then figuring out a good logic to combine those or add proper installation instructions

Mh i think we should keep the dependency split and handling the optional dependencies in the same PR (at the end it's not much new logic) because it's in context and so it's easier to get back to it.
About the different install options i was also thinking about but at the end i would say it's better to reduce the different options to base and full (with optional deps) because otherwise the user can install optional dependencies on it's own if required/needed (the exception messages are clear enough and easy to understand additional this impacts only .show() and .from_url() which is also updated in the readme with a comment # requires weasyprint to be installed).
For the future i don't think that we will add much more optional dependencies so i would say it's better to keep the installation jungle to a minimum 😅
@odulcy-mindee wdyt ?

@felixdittrich92 felixdittrich92 changed the title [Draft] Exclude deps & split into optional parts Exclude deps & split into optional parts Apr 18, 2024
doctr/io/elements.py Show resolved Hide resolved
doctr/io/reader.py Show resolved Hide resolved
doctr/utils/reconstitution.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@felixdittrich92
Copy link
Contributor Author

moved to viz and html as options :)

i would like to keep the "full" installations for the CI jobs instead of cherry picking the install versions depending on the jobs

api tests would fail if there is an export/import problem because the api installs only python-doctr[tf] without weasyprint / matplotlip & mplcursors

Copy link
Collaborator

@odulcy-mindee odulcy-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Waiting for @frgfm's comment

@felixdittrich92
Copy link
Contributor Author

@frgfm feel free to comment later on 😊 @odulcy-mindee as discussed merge today

@felixdittrich92 felixdittrich92 merged commit f21ac32 into mindee:main Apr 23, 2024
79 of 80 checks passed
@felixdittrich92 felixdittrich92 deleted the optional-dep-split branch April 23, 2024 03:03
Copy link
Collaborator

@frgfm frgfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Late comments, nothing major :)

README.md Show resolved Hide resolved
Comment on lines +344 to +352

headers = {"accept": "application/json"}
params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"}

with open('/path/to/your/doc.jpg', 'rb') as f:
data = f.read()
response = requests.post("http://localhost:8002/ocr", files={'file': data}).json()
files = [ # application/pdf, image/jpeg, image/png supported
("files", ("doc.jpg", f.read(), "image/jpeg")),
]
print(requests.post("http://localhost:8080/ocr", headers=headers, params=params, files=files).json())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, it's better to have the shortest snippet possible for users to try it out. We went from 3 LoC to 5 excluding formatting, and this PR doesn't change the minimal backend API so I'm confused as to why there is a change in the "files" passed 🤔 (if that was changed in another PR, I think we should avoid putting it in this PR 😅 )

A few suggestions:

  • either we identify the N°1 thing people want to modulate in our snippet (say model or file type) and we pass explicitly without changing much (the path to the file in my opinion)
  • or we stick to the original version

Questions/notes:

  • there is a typo above where we pass the port as env var (doesn't work actually)
  • is the accept header really necessary here? params aren't JSON data (it formats the URL) and I don't think files are as well
  • are the params necessary? Let's perhaps leave the default?
  • assuming some people will copy paste, I'd suggest storing the result in response and optionally print it so that people don't have to run it again to get the response

This is a minor detail in the scope of the PR, but maintaining accessibility & readibility of snippet is important for adoption :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct the API has changed in #1522 (with multi file and params support)

Yeah i agree that has nothing to do with the PR (it's something i missed with the API change PR)

  1. Let me check about the port :)
  2. and 3. params & header are optional but about the params i think this shows well that a user can paramterize the request !?
  3. Yeah sure :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, the new syntax is a bit complex for file handling 😅

Only one final thing: I did use the right port but when uploading a file using this snippet, I'm getting errors on the backend side saying it cannot access the .content_type of the UploadFile. I haven't the other PR, so I don't know whether this is some fastapi/starlette version issue or a backend issue but I think there is a problem (if you managed to run it locally, it's quite likely this is a version constraint issue)

Copy link
Contributor Author

@felixdittrich92 felixdittrich92 Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but it's the only way to upload multiple files at onces.

Yeah i tested it again without any issues (also locally)

fastapi                   0.110.0                  pypi_0    pypi
pydantic                  2.6.4                    pypi_0    pypi
pydantic-core             2.16.3                   pypi_0    pypi

which versions have you installed ?

docs/source/getting_started/installing.rst Show resolved Hide resolved
@Suvoo
Copy link

Suvoo commented Jun 5, 2024

This PR:

  • exclude weasyprint / matplotlib & mplcursors from base package -> optional-deps
  • split synth and visualization into different modules for easier handling

Any feedback is welcome 🤗

Closes: #1052 #1540 #1545

Hey, Are there any plans to release this update soon? As the current "pip versions" have the weasyprint issue or else are you planning to keep this only a part of building from source (cloning the repo)?

@felixdittrich92
Copy link
Contributor Author

This PR:

  • exclude weasyprint / matplotlib & mplcursors from base package -> optional-deps
  • split synth and visualization into different modules for easier handling

Any feedback is welcome 🤗
Closes: #1052 #1540 #1545

Hey, Are there any plans to release this update soon? As the current "pip versions" have the weasyprint issue or else are you planning to keep this only a part of building from source (cloning the repo)?

Hey @Suvoo 👋,
We have planned to do the next release end of July / beginning August :)

@Suvoo
Copy link

Suvoo commented Jun 5, 2024

Thanks, will use source build till then👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ext: docs Related to docs folder module: utils Related to doctr.utils os: mac Involving Apple MAC users os: windows Involving Windows users topic: build Related to dependencies and build topic: ci Related to CI topic: documentation Improvements or additions to documentation type: misc Miscellaneous
Projects
None yet
4 participants