Skip to content

Latest commit

 

History

History
238 lines (187 loc) · 9.6 KB

README.md

File metadata and controls

238 lines (187 loc) · 9.6 KB

rhocall

Call regions of homozygosity and make tentative UPD calls

Usage

Usage: rhocall [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  aggregate  Aggregate runs of autozygosity from rhofile...
  annotate  Markup VCF file using rho-calls.
  call      Call runs of autozygosity.
  tally     Tally runs of autozygosity from rhofile.
  viz       Plot binned zygosity and RHO-regions.

rhocall call

Usage: rhocall call [OPTIONS] VCF

  Call runs of autozygosity.

Options:
  -m, --max_hets FLOAT            Max heterozygotes per Mb in a homozygous
                                  block
  -f, --max_het_fraction FLOAT    Max heterozygotes over homozygotes fraction
                                  in a homozygous block
  -n, --minimum_homs INTEGER      Minimum absolute number of homozygotes to
                                  report a block
  -s, --shortest_block INTEGER    Shortest block
  -u, --flag_upd_at_fraction FLOAT
                                  Flag UPD if homozygous blocks span this
                                  fraction of total chr size
  -k, --individual INTEGER        Index of individual in vcf/bcf, 0-based.
  -s, --block_constant INTEGER    Should be adapted to type of analysis(exome
                                  or genome)
  -v, --verbose
  --help                          Show this message and exit.

rhocall aggregate

Usage: rhocall aggregate [OPTIONS] ROH

  Aggregate runs of autozygosity from rhofile into windowed rho BED file.
  Accepts a bcftools roh style TSV-file with CHR,POS,AZ,QUAL.

Options:
  -q, --quality_threshold FLOAT  Minimum quality trusted to start or end ROH-
                                 windows.
  -v, --verbose
  -o, --output FILENAME
  --help                         Show this message and exit.

rhocall tally

Usage: rhocall tally [OPTIONS] ROH

  Tally runs of autozygosity from rhofile. Accepts a bcftools roh style TSV-
  file with CHR,POS,AZ,QUAL.

Options:
  -q, --quality_threshold FLOAT   Minimum quality that counts towards region
                                  totals.
  -u, --flag_upd_at_fraction FLOAT
                                  Flag UPD if this fraction of chr quality
                                  positions called AZ.
  -v, --verbose
  -o, --output FILENAME
  --help                          Show this message and exit.

rhocall annotate

Usage: rhocall annotate [OPTIONS] VCF

  Markup VCF file using rho-calls. Use BED file to mark all variants in AZ
  windows. Alternatively, use a bcftools v>=1.4 file with RG entries to mark
  all vars. With the --no-v14 flag, use an older bcftools v<=1.2 style roh
  TSV to mark only selected AZ variants. Roh is broken in bcftools v1.3  -
  do not use.

Options:
  -r FILENAME                     Bcftools roh style TSV file with
                                  CHR,POS,AZ,QUAL.
  --v14 / --no-v14                Bcftools v1.4 or newer roh file including RG
                                  calls.
  -b FILENAME                     BED file with AZ windows.
  -q, --quality_threshold FLOAT   Minimum quality calls that are imported in
                                  region totals.
  -u, --flag_upd_at_fraction FLOAT
                                  Flag UPD if this fraction of chr quality
                                  positions called AZ.
  -v, --verbose
  -o, --output FILENAME
  --help                          Show this message and exit.

rhocall viz


  Plot binned zygosity and RHO-regions.

Options:
  --out_dir PATH              Output directory. The files will be named
                              out_dir/chr.png. One picture is drawn per
                              chromosome.  [required]
  --wig / --no-wig            Produce wig file.
  -p, --pointsize INTEGER     Size of the points (pixels)
  -r, --rho FILENAME          Input RHO file produced from rhocall  [required]
  -m, --minsnv INTEGER        Minimum number of snvs for each plotted bin
  -M, --maxsnv INTEGER        Maximum number of snvs for each plotted bin
  --minaf FLOAT               Minimum allele frequency. This variable must be
                              set to 0 if the allele frequency is not
                              annotated.
  --maxaf FLOAT               Maximum allele frequency
  --aftag TEXT                The allele frequency tag to use.
  -q, --minqual INTEGER       Do not add SNVs to a bin if their quality is
                              less than this value.
  --mnv / --no-mnv            Include MNV
  -w, --window INTEGER        Window size(bases)
  -s, --rsid / --no-rsid      Skip variants not containing an rsid
  -n, --filter / --no-filter  include variants, even if they are not labeled
                              PASS
  -v, --verbose
  --help                      Show this message and exit.

rhoviz (standalone version)

Usage: rhoviz [OPTIONS] -i input.vcf -r rho.tab -d output_dir

  Visualise the rhocall output file. Genomic regions labeled RHO are visualised as red lines.
  Additionally, the fraction of homozygous snps are visualised as black dots.

Options:
  -r FILENAME                     Input RHO file produced from rhocall

  --help                          Show this message and exit.

  -i                              Input vcf file

  -d                              output directory, the files will be named dir/chr.png,

  -w                              window size(bases) (default = 10 000)

  -m                              minimum number of snvs for each plotted bin (default =2)

  -M                              maximum number of snvs for each plotted bin (default = 20)

  --minaf                         minimum allele frequency (default = 0.1)
                                  (this variable must be set to 0 if the allele frequency is not annotated)

  --maxaf                         maximum allele frequency (default = 0.9)

  --aftag AFTAG                   the allele frequency tag (default = 1000GAF)

  -q                              do not add snvs to a bin if there quality is less than this value (default = 60)
  -p                              Size of the points (pixels)

  -n                              include variants, even if they are not labeled PASS


Examples

Suggested workflow

Preparation

bcftools query -f'%CHROM\t%POS\t%REF,%ALT\t%INFO/AF\n' popfreq.vcf.gz | bgzip -c > popfreq.tab.gz

Call ROH with bcftools

Please see the samtools project for installation instructions, and please refer to Narasimhan et al, 2016 regarding method details.

bcftools roh --AF-file popfreq.tab.gz -I sample.bcf > sample.roh

Annotate variant file (VCF/BCF) with ROH calls

rhocall annotate --v14 -r sample.roh -o sample.14.rho.vcf sample.bcf

Legacy mode for bcftools<1.4: Aggregate ROH calls into windows and annotate

rhocall aggregate sample.roh -o sample.roh.bed
rhocall annotate -b sample.roh.bed -o sample.rho.vcf sample.bcf

Obtain per chromosome overview

rhocall tally sample.roh -o sample.roh.tally.tsv

Export calls and zygosity data to wig and bed files

rhocall viz --rho sample.roh --wig --out_dir rhocall sample.vcf

Additional usage examples

bcftools query -f'%CHROM\t%POS\t%REF,%ALT\t%INFO/AF\n' anon-SweGen_STR_NSPHS_1000samples_snp_freq_hg19.vcf.gz | bgzip -c > anon_SweGen_161019_snp_freq_hg19.tab.gz
bcftools roh --AF-file anon_SweGen_161019_snp_freq_hg19.tab.gz -I 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.bcf > 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.roh

# bcftools <=1.2
rhocall tally 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.roh -o 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.roh.tally.tsv
rhocall annotate --no-v14 -r 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.roh 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.bcf -o 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.roh.vcf
rhocall aggregate 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.roh -o 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.roh.bed
rhocall annotate -b 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.roh.bed -o 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.rho.vcf 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.bcf

# bcftools >=1.4
rhocall annotate --v14 -r 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.14.roh -o 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.14.rho.vcf 2016-14676_sorted_md_rreal_brecal_gvcf_vrecal_comb_BOTH.bcf

# visualize: (using bcftools v1.9)
bcftools roh --AF-file /home/proj/development/rare-disease/references/grch37_anon_swegen_snp_-2016-10-19-.tab.gz -I F0010931_sorted_md_brecal_haptc_vrecal_comb_BOTH.bcf > F0010931_sorted_md_brecal_haptc_vrecal_comb_BOTH.roh
rhocall viz --wig --out_dir rhocall --aftag GNOMADAF --rho F0010931_sorted_md_brecal_haptc_vrecal_comb_BOTH.roh F0010931_sorted_md_brecal_haptc_vrecal_comb_rhocall_vt_frqf_vep_parsed_snpeff_ranked_BOTH.vcf

Test files

The test directory contains test files from the BCFtools/RoH project.

Installation

The cyvcf2 install process appears to be jinxed on certain systems/setups. In practice this means that a chained pip install on a naive system may fail. Installation of each requirement for cyvcf2 prior to installing it appears to work unconditionally.

pip install numpy; pip install Cython
pip install -r requirements.txt
pip install -e .