-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summarize haplotype coverage by titer references using frequencies per haplotype from all available data #173
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds a prototype script that produces derived haplotype strings per record from a given Nextclade annotations file with columns for clade and mutations relative to each clade. The derived haplotypes produced with this script could eventually replace the haplotypes we build from the mutation-annotated trees and allow us to calculate haplotype frequencies from all available data instead of a subset of data used to build a tree. Related to #130 Depends on nextstrain/nextclade#1492
huddlej
changed the title
Stub script for derived haplotypes from Nextclade
Summarize haplotype coverage by titer references using frequencies per haplotype from all available data
Jul 5, 2024
Replaces a within-script filtering of Nextclade records by QC with a separate workflow rule that produces a new file with only non-bad records. This new file will serve as input to other rules that build on high-quality Nextclade annotations.
Adds rules to get derived haplotypes from Nextclade annotations for all data and then join those haplotypes with the metadata. The resulting metadata file has strain name, collection date, and haplotype columns that we need for the next steps of the workflow to estimate haplotype frequencies and annotate haplotypes by available titer references.
Adds a script and rule to estimate "tip" frequencies JSON from metadata alone. This simple functionality isn't provided directly through `augur frequencies`, so this commit adds a script that replicates some of the internal logic of that Augur script to get a tip frequencies JSON with the KDE-based method. Since KDE frequency estimates only require a list of dates, we can estimate frequencies for each sequence in the metadata and use those estimates in a subsequent rule to estimate frequencies of derived haplotypes. In this commit, I chose to limit the frequency estimation period to a max date of 4 weeks prior to the current run date and a min date 16 weeks prior. These frequencies will only be used initially to compare the most recent value to the timepoint just previous to calculate a delta frequency.
Adds rule and script to summarize derived haplotype frequencies from all available data.
Updates the script that annotates derived haplotypes for nodes in the tree to use the same style as the haplotypes table with hyphen-delimited mutations (which work as values in URL parameters unlike comma-delimited lists) and with the ancestral allele included for each mutation. These changes should allow us to link from the haplotype tables to the tree view for the same haplotypes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
Replaces the current table of derived haplotype frequencies and titer references that is based on a subsampled HA tree with a table based on all available sequences during the same time period.
With the latest version of Nextclade, we can determine derived haplotype strings per record from a Nextclade annotations file with columns for clade and mutations relative to each clade. We can then calculate haplotype frequencies from all available data instead of a subset of data used to build a tree.
Development checklist
Related issue(s)
Related to #130
Depends on nextstrain/nextclade#1492
Checklist