Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to populate Fisher test annotation #2287

Open
pd3 opened this issue Sep 26, 2024 · 1 comment
Open

How to populate Fisher test annotation #2287

pd3 opened this issue Sep 26, 2024 · 1 comment

Comments

@pd3
Copy link
Member

pd3 commented Sep 26, 2024

Hello, I'm revisiting this comment as I've run into a similar observation in my dataset- the FS field (fisher test for strand bias) does not seem to be correctly populated — it is coming up as zero for all genotypes so everything is passing. I've read through this thread and am a bit confused- is this something that needs to be explicitly specified (ie, accounting for this is not the default)?

We've just spent a long time calling reads for a very large dataset and would like to avoid doing it all over again. Is there a way to populate this field after the bcftools mpileup? We have retained the read depth for for/rev reads supporting alt/ref so is it possible to calculate this without recalling genotypes? Thank you so much!

Originally posted by @paigeduffin in #1834 (comment)

@pd3
Copy link
Member Author

pd3 commented Sep 26, 2024

Depending on the version used, recent versions of mpileup have to be run with -a INFO/FS in order for the field to be populated. Run bcftools mpileup -a \? to see which annotations are filled in by default and which need to be given explicitly.

However, what to do now, if recalling is not an option? I see three possibilities:

  1. Extract the necessary information using bcftools query, compute the test using a third-party software, then reannotate with bcftools annotate.

  2. Use the bcftools +ad-bias plugin, it can filter by Fisher test on FMT/AD

  3. Perhaps best would be to extend the plugin +fill-tags, currently it supports the following tags, adding Fisher test would be trivial

$ bcftools +fill-tags -- -l  
INFO/AC        Number:A  Type:Integer  ..  Allele count in genotypes
INFO/AC_Hom    Number:A  Type:Integer  ..  Allele counts in homozygous genotypes
INFO/AC_Het    Number:A  Type:Integer  ..  Allele counts in heterozygous genotypes
INFO/AC_Hemi   Number:A  Type:Integer  ..  Allele counts in hemizygous genotypes
INFO/AF        Number:A  Type:Float    ..  Allele frequency from FMT/GT or AC,AN if FMT/GT is not present
INFO/AN        Number:1  Type:Integer  ..  Total number of alleles in called genotypes
INFO/ExcHet    Number:A  Type:Float    ..  Test excess heterozygosity; 1=good, 0=bad
INFO/END       Number:1  Type:Integer  ..  End position of the variant
INFO/F_MISSING Number:1  Type:Float    ..  Fraction of missing genotypes (all samples, experimental)
INFO/HWE       Number:A  Type:Float    ..  HWE test (PMID:15789306); 1=good, 0=bad
INFO/MAF       Number:1  Type:Float    ..  Frequency of the second most common allele
INFO/NS        Number:1  Type:Integer  ..  Number of samples with data
INFO/TYPE      Number:.  Type:String   ..  The record type (REF,SNP,MNP,INDEL,etc)
FORMAT/VAF     Number:A  Type:Float    ..  The fraction of reads with the alternate allele, requires FORMAT/AD or ADF+ADR
FORMAT/VAF1    Number:1  Type:Float    ..  The same as FORMAT/VAF but for all alternate alleles cumulatively
TAG:Number=Type(EXPR)                  ..  Experimental support for user expressions such as DP:1=int(sum(DP))
               If Number and Type are not given (e.g. DP=sum(DP)), variable number (Number=.) of floating point
               values (Type=Float) will be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant