You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am testing with a small data samples how recall works in bcbio. The data set consist in two samples against tomato genome chr1.
I have runned 3 different pipelines to check differences between them:
First aproach: run both samples together with freebayes. COmmand: freebayes -f SL2.50ch01.fasta --min-repeat-entropy 1 --use-best-n-alleles 4 --min-mapping-quality 20 --min-mapping-quality 57 --mismatch-base-quality-threshold 15 --read-max-mismatch-fraction 0.05 --min-coverage 6 --genotype-qualities --report-genotype-likelihood-max --min-alternate-count 2 --min-alternate-fraction 0.2 -b sample1.bam sample2.bam.
Result:
$ vcffilter -f 'QUAL > 20' together.vcf.gz|grep -v "^#"|wc -l
473
Second approach: use bcbio.recall with both bams and a vcf for each sample(called using above command for each sample).
Result:
$ vcffilter -f 'QUAL > 20' bcbio_recall.vcf.gz|grep -v "^#"|wc -l
48
Third approach:
1.- Call snps using above command with sample1.
2.- Next I call snps using sample1 snps as input-vcf.
3.- Use vcfintersect to get new spns in sample 2 not in sample 1.
4- Using the new snps from step 3 using as input-variant call freebayes. Use only-use-input-alleles option for freebayes.
5.- Merge snps for sample 1 using results from step 1 and 4.(GATK)
6.- Merge snps from sample 1 and sample2(bcftools merge)
Peio;
Thanks for testing this out and sorry about the problems. It looks like your step 3 uses a similar approach to that done in bcbio.variation.recall. The main difference is probably in how the quality values are assigned to the final files. Freebayes will assign low quality scores on recalling, and some merge approaches use the minimum quality score while others use an average or max. If you remove the QUAL > 20 filter and just count on PASS, you should see similar results from bcbio.recall. Hope this explains the discrepancy.
Thanks Brad,
The discrepancy is smaller if I remove the filter but it's still there. Between 800 snv in all_together and 500 recalling. I will continue looking at it :)
peio
Hi,
I am testing with a small data samples how recall works in bcbio. The data set consist in two samples against tomato genome chr1.
I have runned 3 different pipelines to check differences between them:
First aproach: run both samples together with freebayes. COmmand: freebayes -f SL2.50ch01.fasta --min-repeat-entropy 1 --use-best-n-alleles 4 --min-mapping-quality 20 --min-mapping-quality 57 --mismatch-base-quality-threshold 15 --read-max-mismatch-fraction 0.05 --min-coverage 6 --genotype-qualities --report-genotype-likelihood-max --min-alternate-count 2 --min-alternate-fraction 0.2 -b sample1.bam sample2.bam.
Result:
$ vcffilter -f 'QUAL > 20' together.vcf.gz|grep -v "^#"|wc -l
473
Second approach: use bcbio.recall with both bams and a vcf for each sample(called using above command for each sample).
Result:
$ vcffilter -f 'QUAL > 20' bcbio_recall.vcf.gz|grep -v "^#"|wc -l
48
Third approach:
1.- Call snps using above command with sample1.
2.- Next I call snps using sample1 snps as input-vcf.
3.- Use vcfintersect to get new spns in sample 2 not in sample 1.
4- Using the new snps from step 3 using as input-variant call freebayes. Use only-use-input-alleles option for freebayes.
5.- Merge snps for sample 1 using results from step 1 and 4.(GATK)
6.- Merge snps from sample 1 and sample2(bcftools merge)
result:
$ vcffilter -f 'QUAL > 20' both_recall.vcf.gz|grep -v "^#"|wc -l
358
I know that you know more than me in this area. Do you know why is this happening? Maybe I am been to strict with the filtering options of freebayes?
Thanks in advance.
Peio Ziarsolo
The text was updated successfully, but these errors were encountered: