Skip to content

Commit

Permalink
Merge pull request #10 from sigven/resolve_identical_format_info_tags
Browse files Browse the repository at this point in the history
resolve duplicate INFO/FORMAT tags
  • Loading branch information
sigven authored Mar 9, 2023
2 parents 9c74391 + 27f2d47 commit d2678df
Show file tree
Hide file tree
Showing 3 changed files with 117 additions and 63 deletions.
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# vcf2tsvpy: genomic VCF to tab-separated values (TSV)

[![Anaconda-Server Badge](https://anaconda.org/bioconda/vcf2tsvpy/badges/installer/conda.svg)](https://conda.anaconda.org/bioconda)  [![Anaconda-Server Badge](https://anaconda.org/bioconda/vcf2tsvpy/badges/latest_release_date.svg)](https://anaconda.org/bioconda/vcf2tsvpy)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/vcf2tsvpy/badges/version.svg)](https://conda.anaconda.org/bioconda)  [![Anaconda-Server Badge](https://anaconda.org/bioconda/vcf2tsvpy/badges/latest_release_date.svg)](https://anaconda.org/bioconda/vcf2tsvpy)

A small Python program that converts genomic variant data encoded in [VCF format](https://samtools.github.io/hts-specs/VCFv4.2.pdf) into a tab-separated values (TSV) file.

Expand All @@ -16,6 +16,11 @@ The program utilizes the [cyvcf2](https://github.com/brentp/cyvcf2) library to p

**IMPORTANT**: If you run *vcf2tsvpy* with a large multi-sample VCF file, the file size of the output TSV will quickly grow fairly large, since there is, by default, one line per sample genotype in the output. Turn on `--skip_genotype_data` if you are primarily interested in the variant INFO elements, file size of output TSV will also be considerably smaller.

## News

* March 9th 2023: **0.6.1 release**
- Handling of cases where a tag is found __both__ in `INFO` and `FORMAT` columns of VCF (e.g. `DP`). For such cases, the `INFO` tag name is now prepended with a *INFO_* string (e.g. `INFO_DP`), ensuring non-duplicate columns in the final output TSV file.

## Installation

The software can be installed with the [Conda](https://docs.conda.io/en/latest/) package manager, using the following command:
Expand Down
13 changes: 13 additions & 0 deletions vcf2tsvpy.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX
Loading

0 comments on commit d2678df

Please sign in to comment.