Interpreting MSFragger Output

By default, MSFragger generates a pepXML file (<run name>.pepXML) for every spectral file searched. For open searches and mass offset searches, a tab-separated file (<run name>.tsv) of the search hits will also be generated by default. To generate this tsv file in other situations, select 'TSV_PEPXML' output format in FragPipe (MSFragger tab, Advanced Output Options) or set output_format = tsv_pepXML in the fragger.params file if running in the command line.

The pepXML outputs can be used for downstream processing (e.g. FDR control, protein inference) using PeptideProphet in TPP directly. For viewing of results or conversion to other peptide identification result formats for use in other pipelines or tools that do not support pepXML, we recommend first converting to the mzIdentML format using the tool idconvert as part of the ProteoWizard package.

Please note: The pepXML files produced by MSFragger may have additional attributes (e.g., uncalibrated_precursor_neutral_mass and ion_mobility) not in the original schema. According to our tests, both PeptideProphet and Philosopher can process those additional attributes.

The output fields of the TSV file (if enabled) produced by MSFragger are listed below:

scannum scan number of the MS/MS spectrum within the spectral file

precursor_neutral_mass neutral mass of the identified peptide ion as measured (in Da)

retention_time MS/MS spectrum retention time (in minutes)

charge charge state of the identified peptide ion

hit_rank position of the identification within all matches to the spectrum (1=highest scoring match)

peptide stripped amino acid sequence of the identified peptide

peptide_prev_aa amino acid directly preceding the identified peptide within the mapped protein sequence

peptide_next_aa amino acid directly following the identified peptide within the mapped protein sequence

protein complete FASTA header of the originating protein sequence

num_matched_ions count of fragment ions matching the identified peptide sequence (includes mass-shifted ions from localization-aware matching)

tot_num_ions total count of theoretical fragment ions from the peptide

calc_neutral_pep_mass theoretical mass of the identified peptide ion (in Da)

massdiff difference between measured and theoretical precursor neutral mass

num_tol_term number of enzymatic termini (2=fully enzymatic, 1=semi-enzymatic, 0=non-enzymatic)

num_missed_cleavages number of missed enzymatic cleavage sites in the identified peptide sequence

modification_info position, identity, and mass of each identified modification specified as fixed or variable in the search (does not include mass differences from open or mass offset searches), multiple modifications are comma-separated

hyperscore similarity score between observed and theoretical spectra, higher values indicate greater similarity

nextscore similarity score (hyperscore) of second-highest scoring match for the spectrum

expectscore expectation score of the peptide-spectrum match, lower values indicate higher likelihood

best_locs peptide sequence with most probable delta mass (massdiff) locations indicated with lowercase letters

score_without_delta_mass similarity score (hyperscore) if no delta mass is included on the peptide

best_score_with_delta_mass similarity score (hyperscore) if the indicated delta mass (massdiff) is included on the peptide

second_best_score_with_delta_mass similarity score (hyperscore) of the second-highest scoring match if the indicated delta mass (massdiff) is included on the peptide

delta_score similarity score difference between best_score_with_delta_mass and second_best_score_with_delta_mass

alternative_proteins FASTA headers of any additional proteins the identified peptide maps to, list separated by @@


