Skip to content

Identification Input Formats

Robert Millikin edited this page Mar 15, 2021 · 5 revisions

Currently, FlashLFQ accepts the following file formats:

  • MetaMorpheus .psmtsv
  • Morpheus .tsv
  • MaxQuant msms.txt
  • PeptideShaker .tabular
  • Generic tab-delimited .txt or .tsv

If you are a search software developer and you would like us to natively support your identification format, please open an issue.

If you are using one of the non-generic search engines defined above, you can simply use the files as-is. FlashLFQ is designed to automatically interpret the required information from them. There can be some oddities about some engines because of their output formats (especially what is considered to be a "shared peptide"); these are described below.

If you are using another search engine and wish to quantify your data with FlashLFQ, this is doable. You will need to format your identifications into the "generic" style so that FlashLFQ can interpret the identification results.

MetaMorpheus

  • False discovery rate filtering - FlashLFQ will quantify peptides below 1% FDR and 1% notch FDR.
  • Mass-differences - FlashLFQ will try to find the "peptide theoretical mass", not the experimental mass. An unidentified mass-difference will mess up the peptide's peakfinding. FlashLFQ thus will not quantify open-mass search results automatically. You can, however, attempt to alter the peptide's theoretical mass by the mass-difference. You will also likely need to edit the "Full Sequence" to include the corresponding mass-shift.
  • Shared peptides - may not be completely accurate if you performed protein grouping. MetaMorpheus removes protein possibilities from the PSM output that were not identified by protein inference. These proteins are unlikely to be reliably detected in your sample, but they are still technically possibilities. If you performed protein grouping, a "shared peptide" is thus a peptide that belongs to an ambiguous protein group, or multiple protein groups.

Morpheus

  • False discovery rate filtering - FlashLFQ will quantify peptides below 1% FDR.
  • Mass-differences - same as MetaMorpheus. If you try to do an open-mass or notch search with Morpheus, you will need to change the theoretical peptide mass and likely the modified sequence.
  • Shared peptides - Morpheus only lists one parent protein per PSM, even if there are multiple protein options. Thus "shared peptide" here is completely meaningless. Use caution.
  • Gene names - Gene names are not listed by Morpheus.
  • Organism names - Organism names are not listed by Morpheus.

MaxQuant

  • False discovery rate filtering - FlashLFQ assumes that the output is already filtered to a 1% FDR; thus, no FDR filtering is performed.
  • Shared peptides - it is assumed that MaxQuant reports all parent protein possibilities for each PSM. "Shared peptide" truly means "shared peptide".
  • Organism names - Organism names are not listed by MaxQuant.

PeptideShaker

  • False discovery rate filtering - same as MaxQuant. No FDR filtering is performed by FlashLFQ.
  • Shared peptides - same as MaxQuant. It is assumed that all parent protein possibilities are listed for each PSM.
  • Retention time - it is assumed that the RT is in seconds, not minutes. The RT will be divided by 60 because FlashLFQ's retention times are in minutes.
  • Gene names - Gene names are not listed by PeptideShaker.
  • Organism names - Organism names are not listed by PeptideShaker.

Generic

The first line of the text file should contain column headers identifying what each column is. For search software that lists decoys and PSMs above 1% FDR, you may want to remove these prior to FlashLFQ analysis. FlashLFQ will probably crash if ambiguous PSMs are passed into it (e.g., a PSM with more than 2 peptides listed in one line).

The following headers are required in the list of MS/MS identifications:

  • File Name - With or without file extension (e.g. MyFile or MyFile.mzML)

  • Base Sequence - Should only contain an amino acid sequence (e.g., PEPTIDE and not PEPT[Phosphorylation]IDE

  • Full Sequence - Modified sequence. Can contain any characters (e.g., PEPT[Phosphorylation]IDE is fine), but must be consistent between the same peptidoform to get accurate results

  • Peptide Monoisotopic Mass - Theoretical monoisotopic mass, including modification mass

  • Scan Retention Time - MS/MS identification scan retention time in minutes

  • Precursor Charge - Charge of the ion selected for MS/MS resulting in the identification. Use the number only (e.g., "3" and not "+3")

  • Protein Accession - Protein accession(s) for the peptide. It is important to list all of the parent protein options if you want the "shared peptides" to be accurate. Use the semicolon (;) to delimit different proteins.

Click here to download an example of a Generic format PSM file.