Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different snv numbers for different aproaches #8

Open
pziarsolo opened this issue Sep 28, 2015 · 2 comments
Open

different snv numbers for different aproaches #8

pziarsolo opened this issue Sep 28, 2015 · 2 comments

Comments

@pziarsolo
Copy link

Hi,
I am testing with a small data samples how recall works in bcbio. The data set consist in two samples against tomato genome chr1.
I have runned 3 different pipelines to check differences between them:
First aproach: run both samples together with freebayes. COmmand: freebayes -f SL2.50ch01.fasta --min-repeat-entropy 1 --use-best-n-alleles 4 --min-mapping-quality 20 --min-mapping-quality 57 --mismatch-base-quality-threshold 15 --read-max-mismatch-fraction 0.05 --min-coverage 6 --genotype-qualities --report-genotype-likelihood-max --min-alternate-count 2 --min-alternate-fraction 0.2 -b sample1.bam sample2.bam.
Result:
$ vcffilter -f 'QUAL > 20' together.vcf.gz|grep -v "^#"|wc -l
473

Second approach: use bcbio.recall with both bams and a vcf for each sample(called using above command for each sample).
Result:
$ vcffilter -f 'QUAL > 20' bcbio_recall.vcf.gz|grep -v "^#"|wc -l
48

Third approach:
1.- Call snps using above command with sample1.
2.- Next I call snps using sample1 snps as input-vcf.
3.- Use vcfintersect to get new spns in sample 2 not in sample 1.
4- Using the new snps from step 3 using as input-variant call freebayes. Use only-use-input-alleles option for freebayes.
5.- Merge snps for sample 1 using results from step 1 and 4.(GATK)
6.- Merge snps from sample 1 and sample2(bcftools merge)

result:
$ vcffilter -f 'QUAL > 20' both_recall.vcf.gz|grep -v "^#"|wc -l
358

I know that you know more than me in this area. Do you know why is this happening? Maybe I am been to strict with the filtering options of freebayes?

Thanks in advance.
Peio Ziarsolo

@chapmanb
Copy link
Member

Peio;
Thanks for testing this out and sorry about the problems. It looks like your step 3 uses a similar approach to that done in bcbio.variation.recall. The main difference is probably in how the quality values are assigned to the final files. Freebayes will assign low quality scores on recalling, and some merge approaches use the minimum quality score while others use an average or max. If you remove the QUAL > 20 filter and just count on PASS, you should see similar results from bcbio.recall. Hope this explains the discrepancy.

@pziarsolo
Copy link
Author

Thanks Brad,
The discrepancy is smaller if I remove the filter but it's still there. Between 800 snv in all_together and 500 recalling. I will continue looking at it :)
peio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants