Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow build-specific --narrow-bandwidth param in frequencies #1130

Merged
merged 3 commits into from
Oct 3, 2024

Conversation

trvrb
Copy link
Member

@trvrb trvrb commented Jul 26, 2024

Description of proposed changes

This is a bug fix PR. There was a bug in the current behavior where the rule frequencies was calling config["frequencies"]["narrow_bandwidth"]. This resulted in --narrow-bandwidth to be fixed to whatever was specified in parameters.yaml. However, we were relying on build-specific settings to differentiate behavior in the all-time vs 6m vs 2m vs 1m builds, ala:

frequencies
    global_1m:
      narrow_bandwidth: 0.019

This PR fixes this issue and allows overriding of narrow_bandwidth in parameters.yaml by build-specific settings. It also provides a genuine default (equal to augur default) so that parameters.yaml doesn't have to always specify narrow_bandwidth.

Additionally, this PR removes the build-specific setting for recent_days_to_censor. This setting wasn't actually being applied in a build-specific fashion and so we had been always using the workflow default of 0. I decided to stick with this.

Testing

I've tested locally and via trial build. Trail builds are visible at:

There was a bug in the current behavior where the rule `frequencies` was calling `config["frequencies"]["narrow_bandwidth"]`. This resulted in --narrow-bandwidth to be fixed to whatever was specified in parameters.yaml. However, we were relying on build-specific settings to differentiate behavior in the all-time vs 6m vs 2m vs 1m builds, ala:

```
frequencies
    global_1m:
      narrow_bandwidth: 0.019
```

This commit fixes this issue and allows overriding of narrow_bandwidth in parameters.yaml by build-specific settings. It also provides a genuine default (equal to augur default) so that parameters.yaml doesn't have to always specify narrow_bandwidth.
@trvrb trvrb force-pushed the build-specific-narrow-bandwidth branch from 1363c19 to 3d8bed9 Compare September 26, 2024 23:39
This removes `recent_days_to_censor` from GISAID/21L and open builds to match removal from GISAID builds.
@trvrb trvrb force-pushed the build-specific-narrow-bandwidth branch from 8eca4ee to f8081fe Compare September 27, 2024 00:30
This follows recommendation in issue #1131 to use the pattern

frequencies:
  default:
    min_date: "6M"
    narrow_bandwidth: 0.038

This use of "default" extends just to min_date, max_date and narrow_bandwidth. This behavior is now documented in parameters.yaml as well as workflow-config-file.rst.

The specification of frequencies parameters in builds.yaml now follows this pattern.
@trvrb trvrb force-pushed the build-specific-narrow-bandwidth branch from f8081fe to 6458357 Compare September 27, 2024 00:33
@trvrb
Copy link
Member Author

trvrb commented Sep 27, 2024

Okay. I've updated this PR to follow "strategy 2" from #1131 and use default to specify build-specific defaults. I've updated parameters.yaml and workflow-config-file.rst to spell out what parameters can go under default and what are global.

I've also re-triggered trial builds with https://github.com/nextstrain/ncov/actions/runs/11062321363.

@victorlin: Could you review this when you have a moment? I think it should be ready to be merged.

Copy link
Member

@victorlin victorlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good – one small thing and a couple non-blocking comments. I can make these changes if it helps.



# settings that can be over-ridden across all builds, but not for specific builds
recent_days_to_censor: 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is now defined in two places: the config file (where this comment is added) and in the workflow:

recent_days_to_censor = config.get("frequencies", {}).get("recent_days_to_censor", 0)

Since all workflow invocations¹ source from defaults/parameters.yaml, the default value defined in common.smk never gets used. I suggest removing it:

recent_days_to_censor = config["frequencies"]["recent_days_to_censor"]

¹ under expected usage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I liked having workflow default in addition to the parameters.yaml default. As you point out this redundancy also exists for frequencies.narrow_bandwidth.

But perhaps I understand your concern, where by having multiple layers of "defaults" it might add confusion. It would be nice if people could look just at parameters.yaml to understand defaults rather than digging into the workflow.

I think the original push for adding workflow redundancy here was that parameters.yaml (still) doesn't include recent_days_to_censor and if we start looking for it in the workflow and people are stilling using old parameters.yaml files then things will break.

I think we need a broader pattern to adhere to here (kind of like the conversion in issue #1131). I'm not immediately sure what's best for this specific PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we start looking for it in the workflow and people are stilling using old parameters.yaml files then things will break.

I'm not familiar with external usage of ncov so I appreciate the thoughts here.

Since parameters.yaml is under version control, shouldn't it be sufficient to have workflow changes + parameters.yaml changes (adding recent_days_to_censor) in the same PR? If someone pulls the workflow changes, they'll also pull the default recent_days_to_censor in parameters.yaml. I can think of a few possibilities where breakage might happen:

  1. User has made custom changes to parameters.yaml. git pull will try to auto-merge and and flag any merge conflicts.

  2. User has explicitly removed this line from Snakefile:

    ncov/Snakefile

    Line 42 in 82ca8d4

    configfile: "defaults/parameters.yaml"

  3. User is using a profile that does not source from parameters.yaml, i.e. missing this line:

    configfile:
    - defaults/parameters.yaml

    This seems the most likely for external usage, and I believe it's one of the reasons why we now recommend --configfile over --profile nowadays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a broader pattern to adhere to here

The broader pattern seems to be having a default in parameters.yaml + direct access in Snakemake. This is how it's done for other config:

pivot_interval = config["frequencies"]["pivot_interval"],
pivot_interval_units = config["frequencies"]["pivot_interval_units"],
narrow_bandwidth = config["frequencies"]["narrow_bandwidth"],
proportion_wide = config["frequencies"]["proportion_wide"]

# Number of weeks between pivots
pivot_interval: 1
# Measure pivots in weeks
pivot_interval_units: "weeks"
# KDE bandwidths in proportion of a year to use per strain.
# using 15 day bandwidth
narrow_bandwidth: 0.041
proportion_wide: 0.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thoughts. I agree with your logic. Though I would like to push the fixes to a separate PR.

Comment on lines +217 to +219
# else return augur frequencies default value
else:
return 0.0833
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking)

Suggested change
# else return augur frequencies default value
else:
return 0.0833

This shouldn't be necessary since defaults/parameters.yaml already defines the default value and should always be used¹.

Non-blocking because _get_min_date_for_frequencies already has similar redundancy.

¹ under expected usage

max_date: "2022-01-01"
narrow_bandwidth: 0.076

Each named traits configuration (``default`` or build-named) supports specification of ``min_date``, ``max_date`` and ``narrow_bandwidth``. Other parameters can only be specified across all builds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking)

Would it be worth allowing build-specific recent_days_to_censor, pivot_interval, etc. as well? If done right, this could reduce the duplication of functions _get_min_date_for_frequencies/_get_max_date_for_frequencies/_get_narrow_bandwidth_for_wildcards.

Obviously beyond the scope this PR – I could take this or at least make an issue to start it off.

Copy link
Member Author

@trvrb trvrb Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally. Ideally, every entry in parameters.yaml would be able to be over-ridden with build-specific entries in builds.yaml. As you say, the current Snakemake workflow strategy here however would require padding out individual functions for each parameter. It would be wonderful to have a generic strategy that makes for automatic over-rides.

I think maybe more important than making this work for ncov is thinking through how to do this right across pathogens. Everyone requests the ncov style parameter-overrides (this came up for mpox most obviously). cc @joverlee521

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's definitely been a lot of discussions around build configs outside of ncov. I'll take some time to wrangle previous discussions and write up a summary outside of this PR (probably in pathogen-repo-guide).

@victorlin victorlin mentioned this pull request Oct 2, 2024
1 task
@trvrb trvrb merged commit faa85a7 into master Oct 3, 2024
7 checks passed
@trvrb trvrb deleted the build-specific-narrow-bandwidth branch October 3, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants