Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does bcftools isec identify the same sites #2230

Open
ym-chen opened this issue Jul 23, 2024 · 1 comment
Open

How does bcftools isec identify the same sites #2230

ym-chen opened this issue Jul 23, 2024 · 1 comment

Comments

@ym-chen
Copy link

ym-chen commented Jul 23, 2024

I used bcftools isec to find common sites in multiple vcf files. But I found a confused site. The records from vcfs are:

vcf1
chr3 183987815 . G A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=152,144|24,25;DP=351;ECNT=1;GERMQ=93;MBQ=38,31;MFRL=288,304;MMQ=60,60;MPOS=43;NALOD=2.37;NLOD=68.24;POPAF=6;TLOD=129.08 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:236,1:0.00432:237:114,0:98,0:234,1:118,118,0,1 0/1:60,48:0.448:108:28,23:26,21:58,47:34,26,24,24
vcf2
chr3 183987815 . G T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=210,249|22,22;DP=507;ECNT=1;GERMQ=93;MBQ=38,33;MFRL=292,282;MMQ=60,60;MPOS=39;NALOD=2.31;NLOD=60.38;POPAF=6;TLOD=100.18 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:203,0:0.004889:203:109,0:88,0:202,0:85,118,0,0 0/1:256,44:0.145:300:119,23:127,21:254,44:125,131,22,22

Even though the alts of these two sites are not the same, isec still considers these to be a common site. It's not what I expected. I wonder why isec thinks these two sites are the same.

@keenhl
Copy link

keenhl commented Jul 24, 2024

Try using the collapse option

-c, --collapse snps|indels|both|all|some|none|id
Controls how to treat records with duplicate positions and defines compatible records across multiple input files. Here by "compatible" we mean records which should be considered as identical by the tools. For example, when performing line intersections, the desire may be to consider as identical all sites with matching positions (bcftools isec -c all), or only sites with matching variant type (bcftools isec -c snps -c indels), or only sites with all alleles identical (bcftools isec -c none).

none
only records with identical REF and ALT alleles are compatible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants