Add H3N2 HA emerging clades #228

huddlej · 2024-09-11T22:05:14Z

Updates the Auspice JSON tree to include emerging clades for H3N2 HA including J.1.1, J.2.1, and J.2.2.

Related to nextstrain/seasonal-flu#181 which updates the Nextclade dataset workflow to produce these new annotations.

preview: https://master.clades.nextstrain.org/?dataset-server=gh:@add-h3n2-ha-emerging-clades@

Updates the Auspice JSON tree to include emerging clades for H3N2 HA including J.1.1, J.2.1, and J.2.2. Related to nextstrain/seasonal-flu#181 which updates the Nextclade dataset workflow to produce these new annotations.

ivan-aksamentov · 2024-09-11T22:15:29Z

data_output/index.json

-            "clades": 30,
+            "clades": 28,
            "customClades": {
-              "subclade": 36,
-              "short-clade": 30
+              "subclade": 34,
+              "short-clade": 28,
+              "emerging_subclade": 37


I noticed that the number of "big" clades, subclades and short clades (as counted on the tree nodes) all decreased by 2. Not sure if that's something expected or not.

Good eye. The old tree has clade 3C and 3C.2a1b which are missing from the new tree. 3C only had one sample in the old tree and has no samples in the new tree. 3C.2a1b has no samples in either tree, but it was annotated in the old tree and not in the new tree. I suspect that the workflow dropped these clades during subsampling, as we sample more newer sequences.

(Which is to say that for the "recent H3N2 HA" dataset, those missing clades are not a blocking issue for this PR.)

In SC2 and mpox workflows, I force-include at least one representative sequence for each clade I want to include in a build so that all clades I want are represented. Maybe you could adopt some strategy like this to have less randomness involved?

corneliusroemer · 2024-10-16T14:27:58Z

data_output/nextstrain/flu/h3n2/ha/EPI1857216/unreleased/CHANGELOG.md

+
+## 2024-08-08T05:08:21Z
+
+Fix numbering of RBD sites it the `pathogen.json`. The relevant positions were indexed 1-based, when they should have been indexed 0-based.


Suggested change

Fix numbering of RBD sites it the `pathogen.json`. The relevant positions were indexed 1-based, when they should have been indexed 0-based.

Fix numbering of RBD sites in the `pathogen.json`. The relevant positions were indexed 1-based, when they should have been indexed 0-based.

corneliusroemer

This has been lying around for a month - was it waiting for anything in particular other than a merge from us? I've reviewed and have a few comments - not necessarily blocking but might be nice to address anyways.

I can't seem to sort by emerging subclade, why is that @ivan-aksamentov?

@huddlej what distinguishes an emerging subclade from a subclade? Why have that extra column? Does emerging mean provisional and hence what is meant by J.2.1 might change in the future? Otherwise why not just designate as a proper new clade?

Maybe the display name shouldn't have that underscore emerging_subclade but be Emerging subclade - also a short description would be nice for the tooltip. Right now it's empty:

Something like this is possible:

Lastly, it would be nice to maybe add some new example sequences that are part of these new emerging clades.

Here's the tree with coloring by emerging clades:

tsibley · 2024-10-16T17:17:38Z

I can't seem to sort by emerging subclade, why is that @ivan-aksamentov?

It's literally because the text doesn't wrap, which forces the sort asc/desc icons out of view. If I make the text wrap, you can see/use them.

It's conventional to allow clicking the column name/text itself to toggle thru sort state (asc, desc, none), which would at least restore functionality if not the indicators.

ivan-aksamentov · 2024-10-16T19:05:08Z

Yep, if the text does not fit, it will push arrows away from the view - this is a CSS bug. (as a funny workaround you can scroll them back in if you select the text and drag the selection all the way to the right). The easiest is to pick names that are short words or even abbreviations/acronyms, space separated, instead of underscore-separated. The explanation can be tucked into details in the tooltip.

But it's true that I need to return to the table sometimes, it is one of the oldest components and can def use some love.

Some more discussion is in the nextstrain/nextclade#1537

huddlej · 2024-10-16T20:48:31Z

This has been lying around for a month - was it waiting for anything in particular other than a merge from us?

@corneliusroemer I shared some initial context in a related issue that may be helpful background for this PR.

This PR is waiting on two things:

a synchronous discussion between at least @rneher and me about whether this is the right solution to the issue of emerging subclades. I'm not as convinced now that I've used it for a month. I think I'd prefer a way to keep using the same "subclade" field but define new subclades in a prerelease Nextclade dataset that we could use in our reporting and users could opt into through the website. I was hoping to use an upcoming Nextstrain biweekly meeting to chat about this general issue.
inclusion of representative sequences from older clades to avoid loss of those clades in the main H3N2 HA dataset (the approach you described above is what I was planning to do)

corneliusroemer · 2024-10-16T21:21:46Z

Thanks @huddlej for the response, a PR here is enough to have a "prerelease" dataset that's available through for example: https://master.clades.nextstrain.org/?dataset-server=gh:@add-h3n2-ha-emerging-clades@ (and an equivalent invocation of nextclade dataset get with a command line arg specifying the server)

I'll convert this PR to draft state then as it's not actually ready to be reviewed/merged at this point in time.

huddlej · 2024-10-16T21:28:24Z

a PR here is enough to have a "prerelease" dataset that's available through for example

I was hoping for something a little more visible to users of the web UI like a H3N2 HA dataset with both "official" and "experimental" labels on the production website. This would allow folks to use emerging annotations ahead of the various WHO meetings but before they've been released officially. But I'm happy to discuss any potential solutions.

ivan-aksamentov · 2024-10-16T22:59:10Z

Few thoughts:

If this change provides a sufficiently different approach compared to what most users will use, then perhaps it could be a separate dataset? Think of it as a "fork". e.g. we could have
```
nextstrain/flu/h3n2/ha/EPI1857216/default
nextstrain/flu/h3n2/ha/EPI1857216/experimental
```
or
```
nextstrain/flu/h3n2/default/ha/EPI1857216
nextstrain/flu/h3n2/experimental/ha/EPI1857216
```
or whatever the paths/names/flavors you think make sense. The old paths need to be added to the shortcuts for backward compat.

A disadvantage is that both datasets will have to be maintained in sync. You might update default but forget to update experimental - resulting in default being ahead of experimental.

This approach can also be used if no consensus is found on the team - John could just create a community/huddlej/ sub-directory and add his stuff there :)

Or perhaps a new collection nextstrain-experimental/ which will also be considered as "official"?
I just realized that the separate column maybe not a very bad idea - the new column is like a "beta" version of clades and "beta" clades periodically "graduate" to the clade column proper - this way the 2 nomenclatures are always in sync. But that's up to science folks to decide of course - there are considerations way beyond just paths and JSONs.
Improve software: introduce dataset pre-releases and allow users to pick dataset versions - either pre-release/release or even concrete versions. In CLI tags can already be selected, however, to implement pre-releases we will also need to have some kind of a flag for each tag, so that pre-releases are not considered as default for when the tag is not specified - this will likely be a breaking change for CLI. In Web we can do whatever we want - we don't have to maintain a stable interface there.

huddlej added 2 commits September 11, 2024 15:01

Add emerging clades to recent H3N2 HA dataset

6d77d94

Updates the Auspice JSON tree to include emerging clades for H3N2 HA including J.1.1, J.2.1, and J.2.2. Related to nextstrain/seasonal-flu#181 which updates the Nextclade dataset workflow to produce these new annotations.

Update changelog

a02b904

huddlej deployed to refs/pull/228/merge September 11, 2024 22:05 — with GitHub Actions Active

chore: rebuild [skip ci]

4ef5b41

ivan-aksamentov reviewed Sep 11, 2024

View reviewed changes

corneliusroemer reviewed Oct 16, 2024

View reviewed changes

corneliusroemer mentioned this pull request Oct 16, 2024

Allow toggling of sort state by clicking on column name/text itself nextstrain/nextclade#1537

Open

corneliusroemer marked this pull request as draft October 16, 2024 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add H3N2 HA emerging clades #228

Add H3N2 HA emerging clades #228

huddlej commented Sep 11, 2024 •

edited by corneliusroemer

Loading

ivan-aksamentov Sep 11, 2024 •

edited

Loading

huddlej Sep 11, 2024

huddlej Sep 11, 2024

corneliusroemer Oct 16, 2024

corneliusroemer Oct 16, 2024

corneliusroemer left a comment •

edited

Loading

tsibley commented Oct 16, 2024

ivan-aksamentov commented Oct 16, 2024 •

edited

Loading

huddlej commented Oct 16, 2024

corneliusroemer commented Oct 16, 2024

huddlej commented Oct 16, 2024

ivan-aksamentov commented Oct 16, 2024 •

edited

Loading


		## 2024-08-08T05:08:21Z

		Fix numbering of RBD sites it the `pathogen.json`. The relevant positions were indexed 1-based, when they should have been indexed 0-based.

	Fix numbering of RBD sites it the `pathogen.json`. The relevant positions were indexed 1-based, when they should have been indexed 0-based.
	Fix numbering of RBD sites in the `pathogen.json`. The relevant positions were indexed 1-based, when they should have been indexed 0-based.

Add H3N2 HA emerging clades #228

Are you sure you want to change the base?

Add H3N2 HA emerging clades #228

Conversation

huddlej commented Sep 11, 2024 • edited by corneliusroemer Loading

ivan-aksamentov Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

huddlej Sep 11, 2024

Choose a reason for hiding this comment

huddlej Sep 11, 2024

Choose a reason for hiding this comment

corneliusroemer Oct 16, 2024

Choose a reason for hiding this comment

corneliusroemer Oct 16, 2024

Choose a reason for hiding this comment

corneliusroemer left a comment • edited Loading

Choose a reason for hiding this comment

tsibley commented Oct 16, 2024

ivan-aksamentov commented Oct 16, 2024 • edited Loading

huddlej commented Oct 16, 2024

corneliusroemer commented Oct 16, 2024

huddlej commented Oct 16, 2024

ivan-aksamentov commented Oct 16, 2024 • edited Loading

huddlej commented Sep 11, 2024 •

edited by corneliusroemer

Loading

ivan-aksamentov Sep 11, 2024 •

edited

Loading

corneliusroemer left a comment •

edited

Loading

ivan-aksamentov commented Oct 16, 2024 •

edited

Loading

ivan-aksamentov commented Oct 16, 2024 •

edited

Loading