Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

Report plugin dependencies missing from the environment #54

Open
ChrisKeefe opened this issue Feb 15, 2022 · 3 comments
Open

Report plugin dependencies missing from the environment #54

ChrisKeefe opened this issue Feb 15, 2022 · 3 comments

Comments

@ChrisKeefe
Copy link
Collaborator

ChrisKeefe commented Feb 15, 2022

RESCRIPt is used in building our ready-made taxonomic classifiers, but is not shipped with the "core" distribution.

This is going to blow up EVERYONE's replay, making a provenance-aware replay-package-installation tool absolutely critical.
We could probably patch this by abusing the plugin manager on a special-case basis for RESCRIPt, but that's gross.

Edit: the following loses us more than it gains us.
The other approach here, which may be worth pursuing, is cutting the plugin manager out of replay entirely in the local Usage drivers, and letting replay do the best it can from provenance. This will be more permissive, but will leave users with no idea about which parameter names, for example, have changed.

@ChrisKeefe ChrisKeefe changed the title Deal with RESCRIPt being outside of "core" Fail fast when dependency plugins are missing from the environment Apr 29, 2022
@ChrisKeefe
Copy link
Collaborator Author

Two possible paths forward here are:

Currently, failure does not occur until parsing is complete and the replay process begins. For large parsing jobs, this could mean many minutes of user time wasted. As such, I'm inclined toward the second option. #86 could also benefit from parse-time checking.

This is vaguely related to #77, which will catalog the software information we're checking against here.

@ChrisKeefe ChrisKeefe changed the title Fail fast when dependency plugins are missing from the environment Report plugin dependencies missing from the environment Apr 29, 2022
@ChrisKeefe
Copy link
Collaborator Author

Providing a report of all missing plugins seems preferable to failing fast for a few reasons.

  • in the case of quick replays, fail fast doesn't get us much.
  • in the case of long replays, re-running the parse to get to the next missing plugin would be awful.

A dedicated (and lower-overhead) parser that only cares about environment management could make fast failure a viable option (e.g. error message comes quickly, and directs the user to fix the env with the env management tool before re-running), but is probably a more complex solution than is necessary.

Assembled ProvDAGs can be queried as needed to check all expected plugins against whatever's in the current QIIME 2 environment before replay. As a perk, this means that the time cost of checking isn't added to parsing, which is already long.
Unfortunately, this means that the CLI users will need to parse everything twice, because they don't have access to the in-memory DAG. This could add minutes of wait time to their work. (Improving the time cost of parsing per #29 is probably the best fix to this problem.)

Relying on the constructed dag also means that future versions of replay which are equipped to replay across multiple conda environments could use the same data structure as their source.

@nbokulich
Copy link
Member

hey while testing out provenance-lib today I found another plugin that this is an issue for: q2-clawback (some of the pre-trained classifiers in the Q2 data-resources use q2-clawback, but this plugin is not in the "core" distribution).

so this is already an issue with at least 2 plugins. The main case where this will be an issue, though, is if someone wants to replay provenance from a Q2 result (e.g., pulled from a publication or somewhere else online). It will take some trial and error to recreate the necessary env just to parse an existing result.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants