Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCFHeader and VCFHeaderLine rewrite/refactoring #1581

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions src/main/java/htsjdk/samtools/Defaults.java
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,11 @@ public class Defaults {
*/
public static final boolean DISABLE_SNAPPY_COMPRESSOR;

/**
* Strict VCF version validation. Default = true.
*/
public static final boolean STRICT_VCF_VERSION_VALIDATION;


public static final String SAMJDK_PREFIX = "samjdk.";
static {
Expand All @@ -134,6 +139,7 @@ public class Defaults {
SAM_FLAG_FIELD_FORMAT = SamFlagField.valueOf(getStringProperty("sam_flag_field_format", SamFlagField.DECIMAL.name()));
SRA_LIBRARIES_DOWNLOAD = getBooleanProperty("sra_libraries_download", false);
DISABLE_SNAPPY_COMPRESSOR = getBooleanProperty(DISABLE_SNAPPY_PROPERTY_NAME, false);
STRICT_VCF_VERSION_VALIDATION = getBooleanProperty("strict_version_validation", true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the property name also "strict_vcf_version_validation".

}

/**
Expand Down
15 changes: 15 additions & 0 deletions src/main/java/htsjdk/samtools/SAMSequenceDictionary.java
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,13 @@ public SAMSequenceDictionary(final List<SAMSequenceRecord> list) {
setSequences(list);
}

//TODO: this returns sequences in the internal list order instead of
// honoring each sequence's contigIndex
/**
* Get a list of sequences for this dictionary.
* @return the list of sequences for this dictionary in internal order (the order in which the sequences
* were added to this dictionary)
*/
public List<SAMSequenceRecord> getSequences() {
return Collections.unmodifiableList(mSequences);
}
Expand All @@ -75,6 +82,14 @@ public void setSequences(final List<SAMSequenceRecord> list) {
list.forEach(this::addSequence);
}

/**
* Add a sequence to the dictionary.
* @param sequenceRecord the sequence record to add - note that this method mutates the contig
* index of the sequenceRecord to match the newly added record's relative
* order in the list
*/
//TODO: this method ignores (and actually mutates) the sequenceRecord's contig index to make it match
// the record's relative placement in the dictionary's internal list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's a nasty side effect that we've been living with forever.

public void addSequence(final SAMSequenceRecord sequenceRecord) {
if (mSequenceMap.containsKey(sequenceRecord.getSequenceName())) {
throw new IllegalArgumentException("Cannot add sequence that already exists in SAMSequenceDictionary: " +
Expand Down
346 changes: 346 additions & 0 deletions src/main/java/htsjdk/samtools/SAMSequenceDictionaryUtils.java

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions src/main/java/htsjdk/tribble/TribbleException.java
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,12 @@ public static class InternalCodecException extends TribbleException {
public InternalCodecException(String message) { super (message); }
}

public static class VersionValidationFailure extends TribbleException {
public VersionValidationFailure(final String message) {
super(String.format("Version validation failure: %s", message));
}
}

// //////////////////////////////////////////////////////////////////////
// Index exceptions
// //////////////////////////////////////////////////////////////////////
Expand Down
29 changes: 19 additions & 10 deletions src/main/java/htsjdk/variant/bcf2/BCF2Utils.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,11 @@

import htsjdk.samtools.util.FileExtensions;
import htsjdk.tribble.TribbleException;
import htsjdk.variant.vcf.*;
import htsjdk.variant.vcf.VCFConstants;
import htsjdk.variant.vcf.VCFHeader;
import htsjdk.variant.vcf.VCFHeaderLine;
import htsjdk.variant.vcf.VCFIDHeaderLine;
import htsjdk.variant.vcf.VCFSimpleHeaderLine;

import java.io.File;
import java.io.FileNotFoundException;
Expand Down Expand Up @@ -93,10 +97,15 @@ public static ArrayList<String> makeDictionary(final VCFHeader header) {
// set up the strings dictionary
for ( VCFHeaderLine line : header.getMetaDataInInputOrder() ) {
if ( line.shouldBeAddedToDictionary() ) {
final VCFIDHeaderLine idLine = (VCFIDHeaderLine)line;
if ( ! seen.contains(idLine.getID())) {
dict.add(idLine.getID());
seen.add(idLine.getID());
if (!line.isIDHeaderLine()) {
//is there a better way to ensure that shouldBeAddedToDictionary==true only when isIDHeaderLine==true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether shouldBeAddedToDictionary should be removed from VCFHeaderLine and replaced with something like a static method here, since it's a BCF 2.2 specific implementation detail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do that in the BCF branch.

throw new TribbleException(String.format(
"The header line %s cannot be added to the BCF dictionary since its not an ID header line",
line));
}
if ( ! seen.contains(line.getID())) {
dict.add(line.getID());
seen.add(line.getID());
}
}
}
Expand Down Expand Up @@ -291,7 +300,7 @@ else if ( o.getClass().isArray() ) {
* Are the elements and their order in the output and input headers consistent so that
* we can write out the raw genotypes block without decoding and recoding it?
*
* If the order of INFO, FILTER, or contrig elements in the output header is different than
* If the order of INFO, FILTER, or contig elements in the output header is different than
* in the input header we must decode the blocks using the input header and then recode them
* based on the new output order.
*
Expand All @@ -308,15 +317,15 @@ public static boolean headerLinesAreOrderedConsistently(final VCFHeader outputHe
if ( ! nullAsEmpty(outputHeader.getSampleNamesInOrder()).equals(nullAsEmpty(genotypesBlockHeader.getSampleNamesInOrder())) )
return false;

final Iterator<? extends VCFIDHeaderLine> outputLinesIt = outputHeader.getIDHeaderLines().iterator();
final Iterator<? extends VCFIDHeaderLine> inputLinesIt = genotypesBlockHeader.getIDHeaderLines().iterator();
final Iterator<VCFSimpleHeaderLine> outputLinesIt = outputHeader.getIDHeaderLines().iterator();
final Iterator<VCFSimpleHeaderLine> inputLinesIt = genotypesBlockHeader.getIDHeaderLines().iterator();

while ( inputLinesIt.hasNext() ) {
if ( ! outputLinesIt.hasNext() ) // missing lines in output
return false;

final VCFIDHeaderLine outputLine = outputLinesIt.next();
final VCFIDHeaderLine inputLine = inputLinesIt.next();
final VCFSimpleHeaderLine outputLine = outputLinesIt.next();
final VCFSimpleHeaderLine inputLine = inputLinesIt.next();

if ( ! inputLine.getClass().equals(outputLine.getClass()) || ! inputLine.getID().equals(outputLine.getID()) )
return false;
Expand Down
54 changes: 41 additions & 13 deletions src/main/java/htsjdk/variant/variantcontext/writer/VCFWriter.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,19 @@

import htsjdk.samtools.SAMSequenceDictionary;
import htsjdk.samtools.util.IOUtil;
import htsjdk.samtools.util.Log;
import htsjdk.samtools.util.RuntimeIOException;
import htsjdk.tribble.TribbleException;
import htsjdk.tribble.index.IndexCreator;
import htsjdk.utils.ValidationUtils;
import htsjdk.variant.variantcontext.VariantContext;
import htsjdk.variant.variantcontext.VariantContextBuilder;
import htsjdk.variant.vcf.VCFConstants;
import htsjdk.variant.vcf.VCFEncoder;
import htsjdk.variant.vcf.VCFHeader;
import htsjdk.variant.vcf.VCFHeaderLine;
import htsjdk.variant.vcf.VCFHeaderVersion;
import htsjdk.variant.vcf.VCFUtils;

import java.io.BufferedWriter;
import java.io.ByteArrayOutputStream;
Expand All @@ -45,14 +49,15 @@
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.nio.file.Path;
import java.util.stream.Collectors;

/**
* this class writes VCF files
*/
class VCFWriter extends IndexingVariantContextWriter {
protected final static Log logger = Log.getInstance(VCFWriter.class);

private static final String VERSION_LINE =
VCFHeader.METADATA_INDICATOR + VCFHeaderVersion.VCF4_2.getFormatString() + "=" + VCFHeaderVersion.VCF4_2.getVersionString();
private static final String DEFAULT_VERSION_LINE = VCFHeader.DEFAULT_VCF_VERSION.toHeaderVersionLine();

// Initialized when the header is written to the output stream
private VCFEncoder vcfEncoder = null;
Expand Down Expand Up @@ -164,7 +169,7 @@ public void writeHeader(final VCFHeader header) {
}

public static String getVersionLine() {
return VERSION_LINE;
return DEFAULT_VERSION_LINE;
}

public static VCFHeader writeHeader(VCFHeader header,
Expand All @@ -175,12 +180,18 @@ public static VCFHeader writeHeader(VCFHeader header,
try {
rejectVCFV43Headers(header);

// the file format field needs to be written first
// Validate that the file version we're writing is version-compatible this header's version.
validateHeaderVersion(header, versionLine);

// The file format field needs to be written first; below any file format lines
// embedded in the header will be removed
writer.write(versionLine + "\n");

for (final VCFHeaderLine line : header.getMetaDataInSortedOrder() ) {
if ( VCFHeaderVersion.isFormatString(line.getKey()) )
// Remove the fileformat header lines
if ( VCFHeaderVersion.isFormatString(line.getKey()) ) {
continue;
}

writer.write(VCFHeader.METADATA_INDICATOR);
writer.write(line.toString());
Expand All @@ -189,14 +200,9 @@ public static VCFHeader writeHeader(VCFHeader header,

// write out the column line
writer.write(VCFHeader.HEADER_INDICATOR);
boolean isFirst = true;
for (final VCFHeader.HEADER_FIELDS field : header.getHeaderFields() ) {
if ( isFirst )
isFirst = false; // don't write out a field separator
else
writer.write(VCFConstants.FIELD_SEPARATOR);
writer.write(field.toString());
}
writer.write(header.getHeaderFields().stream()
.map(f -> f.name())
.collect(Collectors.joining(VCFConstants.FIELD_SEPARATOR)).toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this too string is unecessary?


if ( header.hasGenotypingData() ) {
writer.write(VCFConstants.FIELD_SEPARATOR);
Expand Down Expand Up @@ -274,6 +280,28 @@ private static void rejectVCFV43Headers(final VCFHeader targetHeader) {
if (targetHeader.getVCFHeaderVersion() != null && targetHeader.getVCFHeaderVersion().isAtLeastAsRecentAs(VCFHeaderVersion.VCF4_3)) {
throw new IllegalArgumentException(String.format("Writing VCF version %s is not implemented", targetHeader.getVCFHeaderVersion()));
}
}

// Given a header and a requested target output version, see if the header's version is compatible with the
// requested version (where compatible means its ok to just declare that the header has the requested
// version).
private static void validateHeaderVersion(final VCFHeader header, final String requestedVersionLine) {
ValidationUtils.nonNull(header);
ValidationUtils.nonNull(requestedVersionLine);

final VCFHeaderVersion vcfCurrentVersion = header.getVCFHeaderVersion();
final VCFHeaderVersion vcfRequestedVersion = VCFHeaderVersion.fromHeaderVersionLine(requestedVersionLine);
if (!vcfCurrentVersion.equals(vcfRequestedVersion)) {
if (!VCFHeaderVersion.versionsAreCompatible(VCFHeaderVersion.fromHeaderVersionLine(requestedVersionLine), vcfCurrentVersion)) {
final String message = String.format("Attempting to write a %s VCF header to a %s VCFWriter",
vcfRequestedVersion,
vcfCurrentVersion.getVersionString());
if (VCFUtils.isStrictVCFVersionValidation()) {
throw new TribbleException(message);
}
logger.warn(message);
}
}
}

}
Loading