Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maf documentation #73

Open
wants to merge 26 commits into
base: develop
Choose a base branch
from
Open

Maf documentation #73

wants to merge 26 commits into from

Conversation

svburke
Copy link

@svburke svburke commented May 22, 2020

No description provided.

|---|---|---|---|
|Annotated_somatic_mutation|Controlled |Annotated VCF|MAF produced from one caller at the aliquot level.|
|Aggregated_somatic_mutation|Controlled |Aggregation of VCFs into one MAF file (*.protected.maf.gz)|Aggregation of aliquot-level MAFs|
|Masked_somatic_mutation|Open\* |Filtered version of aggregated_somatic_mutation MAF (*.somatic.maf.gz)|Filtered aggregation of aliquot-level MAFs|
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't us the protected or somatic naming anymore

Copy link

@kmhernan kmhernan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the figure but the numerated list needs some work

* __deleterious_low_confidence__: Less likely to have a phenotypic effect than 'deleterious'
### Aliquot-Level MAF Files (Data Release >17.0)

Aliquot-level MAF files, annotated somatic mutations, are produced for each aliquot per variant caller. These files are then run through the Aliquot Ensemble Somatic Variant Merging and Masking workflow. There are a few filters that are applied at this step. The variants must be somatic, the variant size must be ≤ 50 bp, and it must pass the filters for the caller, except for MuSE which passes on filters for Tier 1-4 and the panel of normals. From this workflow two files are produced, aggregated somatic mutation and masked somatic mutation. The aggregated somatic mutation file is the aggregation of all variants from the multiple variant callers for each aliquot with these applied filters. The masked somatic mutation file is the aggregation of all variants from the multiple variant callers for each aliquot, which are then passed through a second filtering process.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really for each tumor-normal pair of aliquots

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except for MuSE which passes on filters for Tier 1-4 and the panel of normals

This may confuse ppl cause this panel of normals is specific to MuTect2 which tags it in the VCF, but we also have our own PoN filter that is applied to all MAFs and isn't the one we are referring to here

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are then run through the Aliquot Ensemble Somatic Variant Merging and Masking workflow.

Maybe should be explicit: For each tumor-normal pair, the per-caller Aliquot-level MAF files are then run though the...

edit: ok I see you clarify this at the end of the paragraph

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: MuSE and the PoN @kmhernan
Which panel of normal is used for MuSE? Is it the MuTect2 PoN, the GDC filter PoN, or some mysterious 3rd PoN?

* The annotated somatic mutations aliquot-level MAF files, produced from the different callers, are merged into one raw merged aliquot-level MAF file. Then selection for the variants are made based on the following low quality variant filtering and germline masking:
1. The variant must occur within at least two of the callers.
2. Remaining variants with __FILTER != panel_of_normals__ are __removed__. Note that the `FILTER != panel_of_normals` value is only relevant for the variants generated from the MuTect2 pipeline.
3. The __non-TCGA exac allele frequency__ variants (0.001; common\_in\_exac) are __kept__.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, we remove variants with non-TCGA ExAC allele frequency > than the cutoff; however, they can be rescued based on item number 4

The process for modifying the aliquot-level MAF files into a masked somatic mutation aliquot-level MAF is as follows:

* The annotated somatic mutations aliquot-level MAF files, produced from the different callers, are merged into one raw merged aliquot-level MAF file. Then selection for the variants are made based on the following low quality variant filtering and germline masking:
1. The variant must occur within at least two of the callers.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe: The variant must be supported by at least two of the callers


## MAF File Structure

The MAF files structure can be found in the following github repository:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MAF columns are defined in the following github repository

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a new link we can use for this? The one below cannot be accessed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants