Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some SNV genome variants were found outside of --target_bed file with --bait_padding #631

Open
sitems opened this issue Oct 13, 2024 · 2 comments · May be fixed by #636
Open

Some SNV genome variants were found outside of --target_bed file with --bait_padding #631

sitems opened this issue Oct 13, 2024 · 2 comments · May be fixed by #636
Labels
bug Something isn't working

Comments

@sitems
Copy link

sitems commented Oct 13, 2024

Description of the bug

Relevant portion of my --target_bed file looks like

chr1 35720 35736
chr1 69088 69970
chr1 138529 139696

I'm processing WES fastq file with '--target_bed and --bait_padding 500' options, but in outpur/call_snv/genome/output_snv.vcf.gz I see variants like this

chr1 88177 chr1_88177_G_C G C 8 . AF=1;AQ=8;FOUND_IN=deepvariant GT:DP:AD:GQ:PL:RNC 1/1:2:0,2:6:8,6,0:..

The variant is outside of any (padded) bed region.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@sitems sitems added the bug Something isn't working label Oct 13, 2024
@ramprasadn
Copy link
Collaborator

Thanks for reporting this sitems! I am working on a fix here #633

@jemten
Copy link
Collaborator

jemten commented Oct 17, 2024

Have a new issue with running DeepVariant on exomes after the merge of PR #633.

***** Running the command:*****
time seq 0 35 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "grch37_homo_sapiens_-d5-.fasta" --reads "ADM997A2_sorted_md.bam" --examples "tmp/[email protected]" --channels "insert_size" --gvcf "tmp/[email protected]" --regions "home_bait.intervals_list" --task {}

I1017 19:10:23.285372 139672540231488 genomics_reader.py:222] Reading ADM997A2_sorted_md.bam with NativeSamReader
I1017 19:10:23.315367 139672540231488 make_examples_core.py:301] Task 3/36: Preparing inputs
I1017 19:10:23.389680 139672540231488 genomics_reader.py:222] Reading ADM997A2_sorted_md.bam with NativeSamReader
I1017 19:10:23.405181 139672540231488 make_examples_core.py:301] Task 3/36: Common contigs are ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y', 'MT']
Traceback (most recent call last):
  File "/home/proj/stage/analysis/cases/cleanshrimp/work/Bazel.runfiles_ikroupyj/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 234, in <module>
    app.run(main)
  File "/home/proj/stage/analysis/cases/cleanshrimp/work/Bazel.runfiles_ikroupyj/runfiles/absl_py/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/proj/stage/analysis/cases/cleanshrimp/work/Bazel.runfiles_ikroupyj/runfiles/absl_py/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/proj/stage/analysis/cases/cleanshrimp/work/Bazel.runfiles_ikroupyj/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 224, in main
    make_examples_core.make_examples_runner(options)
  File "/home/proj/stage/analysis/cases/cleanshrimp/work/Bazel.runfiles_ikroupyj/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 2739, in make_examples_runner
    regions, calling_regions = processing_regions_from_options(options)
  File "/home/proj/stage/analysis/cases/cleanshrimp/work/Bazel.runfiles_ikroupyj/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 2641, in processing_regions_from_options
    calling_regions = build_calling_regions(
  File "/home/proj/stage/analysis/cases/cleanshrimp/work/Bazel.runfiles_ikroupyj/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 607, in build_calling_regions
    ranges.RangeSet.from_regions(regions_to_include, contig_dict)
  File "/home/proj/stage/analysis/cases/cleanshrimp/work/Bazel.runfiles_ikroupyj/runfiles/com_google_deepvariant/third_party/nucleus/util/ranges.py", line 170, in from_regions
    return cls(ranges=from_regions(regions, contig_map=contig_map))
  File "/home/proj/stage/analysis/cases/cleanshrimp/work/Bazel.runfiles_ikroupyj/runfiles/com_google_deepvariant/third_party/nucleus/util/ranges.py", line 117, in __init__
    for i, range_ in enumerate(ranges):
  File "/home/proj/stage/analysis/cases/cleanshrimp/work/Bazel.runfiles_ikroupyj/runfiles/com_google_deepvariant/third_party/nucleus/util/ranges.py", line 509, in from_regions
    yield parse_literal(region, contig_map)
  File "/home/proj/stage/analysis/cases/cleanshrimp/work/Bazel.runfiles_ikroupyj/runfiles/com_google_deepvariant/third_party/nucleus/util/ranges.py", line 597, in parse_literal
    raise ValueError(
ValueError: Could not parse "home_bait.intervals_list" as a region literal.  Region literals should have the form "chr:start-stop" or "chr:start" or just "chr".  A common error is to use the "chr" prefix on inputs that don't have it, or vice-versa.

Looks like the call regions should be defined using bed file format and not interval_list format. Running the deepvariant help command:

--regions: Optional. Space-separated list of regions we want to process. Elements can be region literals (e.g., chr20:10-20) or paths to BED/BEDPE files.

What do you think about using the bed file supplied with the parameter --target_bed instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants