Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous development infrastructure for python #353

Closed
wants to merge 22 commits into from

Conversation

jerinphilip
Copy link
Contributor

@jerinphilip jerinphilip commented Feb 15, 2022

This is a work in progress.

This is an exploratory undertaking that attempts to add a missing piece to the python ecosystem provisioned from here. Not everything in this PR is expected to make in (hence marked as experimental). We are also going to use this as a "big" PR to attempt to add some useful features (tested) to be able to better shape the abstractions that repeat across tests.

The following enhancements are to be explored:

  1. Python is a dynamic language. Previously pytype had been added for some static type-checking, which worked well. However, there are parts of the code we will need to know runs as intended and are covered by tests. For example, despite the existing checks stuff like Python bindings for alignment do not (yet?) work #352 went unnoticed. Towards this, we attempt to integrate coverage (to measure how much of the python codebase our tests/runs cover) and pytest (a framework to maintain tests) tentatively to the python subsystem.
  2. Fit pytest to solve some unit-test like checks reporting numbers for HTML feature (Continuous checks and evaluation of HTML translation feature #331, https://github.com/jerinphilip/tagtransfer/blob/master/tagtransfer/xml_eval.py).
  3. Support future plans to create scoreboards for a larger pool of available models from a repository like translateLocally or OPUS. The bergamot.REPOSITORY object is intended to aggregate across multiple repositories and can be used to continuously test the models work as intended through the course of development here. A simple idea is to pass something like 1 2 3 4 5 6 7 8 9 through all available models and assert a successful run as a smoke test. This should cover a lot of ground.

Additional desiderata

  • We don't want to be tied to GitHub CI or use it more than necessary. Certain heavy tasks (like continuously testing all available models) can be offloaded into a more dedicated machine and be run only at critical points.
  • Have a page (sort of a tracking dashboard indicating progress) where the coverage stats and HTML scores are available for quick lookup, automated.
  • Developers should be able to selectively run stuff to simplify than take the entire suite of tests that are going to be added here. For example, if HTML is being worked on, one only needs to run HTML tests.
  • We want to try to not keep much of a hard "expected output == produced output" constraint across tests here. Instead, we should try to find invariants that are to be maintained across models, languages, and HTML pages, reduce strict asserts to scalar scores that can be calibrated to not go below a threshold (e.g: HTML)

(Tentatively) Fixes: #331, #352

@jerinphilip jerinphilip added the experimental Experimental stuff, might make it in might not label Feb 15, 2022
@jerinphilip jerinphilip linked an issue Feb 19, 2022 that may be closed by this pull request
Change README.md to reflect wider capabilities (C++ library, Python,
WebAssembly). Move a bulk of WebAssembly specific intructions to
`wasm/README.md`.
@jerinphilip jerinphilip closed this Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experimental Experimental stuff, might make it in might not
Projects
None yet
1 participant