You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's required for our cross-platform, vendor-everything installation story.
It produces more helpful error messages for invalid YAML.
But for very large projects involving megabytes of YAML it's intolerably slow. This makes local opensafely run very difficult to use, and makes it impossible to dispatch jobs in production because the page times out.
A short term workaround for this would be for the pipeline library to first attempt to parse the YAML using a fast, compiled parser (if one is importable) and only if that produces errors to re-parse it using the pure Python parser to get the helpful error messages.
In psuedo-code, what I'm proposing is something like:
This does make the unhappy path slower, but not by much. And it would massively speed up the happy path, assuming that a fast parser is importable. There are three different contexts we need to think about.
I'm not exactly sure of the mechanics here, but presumably we can use whatever mechanism we do for ensuring that opensafely-cli is installed to also install pyyaml.
3. Running locally
This is obviously the hardest part. I think in the first instance we'd just need to talk the affected users through installing pyyaml (or whatever we choose). That's obviously not sustainable, but it makes it practical right now for these users to interact with their projects locally which I think is really important.
Longer term, if we move to using uv for local installation then the need to keep all our dependencies as pure Python goes away.
The text was updated successfully, but these errors were encountered:
We're using pure Python parsing for two reasons:
But for very large projects involving megabytes of YAML it's intolerably slow. This makes local
opensafely run
very difficult to use, and makes it impossible to dispatch jobs in production because the page times out.A short term workaround for this would be for the pipeline library to first attempt to parse the YAML using a fast, compiled parser (if one is importable) and only if that produces errors to re-parse it using the pure Python parser to get the helpful error messages.
In psuedo-code, what I'm proposing is something like:
This does make the unhappy path slower, but not by much. And it would massively speed up the happy path, assuming that a fast parser is importable. There are three different contexts we need to think about.
1. Job Server
Here it looks like
pyyaml
is already be available so there'd be nothing more to do than upgrading the pipeline library.https://github.com/opensafely-core/job-server/blob/4814a7a17d42c55a508fb527c2b3c5a9121027c3/requirements.prod.txt#L794
2. Codespaces
I'm not exactly sure of the mechanics here, but presumably we can use whatever mechanism we do for ensuring that
opensafely-cli
is installed to also installpyyaml
.3. Running locally
This is obviously the hardest part. I think in the first instance we'd just need to talk the affected users through installing
pyyaml
(or whatever we choose). That's obviously not sustainable, but it makes it practical right now for these users to interact with their projects locally which I think is really important.Longer term, if we move to using
uv
for local installation then the need to keep all our dependencies as pure Python goes away.The text was updated successfully, but these errors were encountered: