Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seg faults #54

Open
kfattila opened this issue May 7, 2024 · 9 comments
Open

seg faults #54

kfattila opened this issue May 7, 2024 · 9 comments

Comments

@kfattila
Copy link

kfattila commented May 7, 2024

I have cloned comet and compiled successfully on May/06.

I tried to build a peptide index with the -i option from the human proteome but comet failed. I wanted to generate non-tryptic peptides with length between 7-15 with common modifications. Comet used around 60 GB RAM (There were an additional 60 GB free RAM), and around at 50 % of the task it the produced the following error:

 Warning - invalid parameter found: variable_mod011.  Parameter will be ignored.
 Warning - invalid parameter found: variable_mod012.  Parameter will be ignored.
 Warning - invalid parameter found: variable_mod013.  Parameter will be ignored.
 Warning - invalid parameter found: variable_mod014.  Parameter will be ignored.
 Warning - invalid parameter found: variable_mod015.  Parameter will be ignored.

 Comet version "2024.01 rev. 0"

 Creating plain peptide/protein index file:
 - parse peptides from database ... WARNING: running job exception ... std::bad_alloc ... exiting ...
^C^C^C

I successfully built and index file with tryptic peptides.

Then I wanted to run a comet search with standard fasta files with the PXD017407 data set. I attached the parameter file.
comet.zip
I got the following error immediately:

 Warning - invalid parameter found: variable_mod011.  Parameter will be ignored.
 Warning - invalid parameter found: variable_mod012.  Parameter will be ignored.
 Warning - invalid parameter found: variable_mod013.  Parameter will be ignored.
 Warning - invalid parameter found: variable_mod014.  Parameter will be ignored.
 Warning - invalid parameter found: variable_mod015.  Parameter will be ignored.

 Comet version "2024.01 rev. 0"

 Search start:  05/06/2024, 02:16:58 PM
 - Input file: /blob/dda/PXD017407/20180630_SKMEL5_P1uM_L7_.mzML
   - Load spectra:malloc(): invalid next size (unsorted)
./run-tide-lite-benchmark.sh: line 12: 238957 Aborted                 $COMET -P/hdd/data/tide-lite-results/PXD017407/comet.params $DATA_DIR/*.mzML
  1. I generated the comet parameters file with the -q option and I got these warnings:
    Warning - invalid parameter found: variable_mod010. Parameter will be ignored.
    There is a typo in the parameter file at the end:
    # Enzyme ntries can be added/deleted/edited
    I think you meant entries, but it is ok if this is a slang.
@jke000
Copy link
Collaborator

jke000 commented May 7, 2024

Attila, thanks for posting this feedback. I just addressed the parameter file typos in the master branch and the 2024 release. If you pull from master again (or manually fix my horrible typo in the params file, changing variable_mod01X to variable_mod1X), those ugly warnings will go away.

I'll look at replicating your enzyme unrestricted search in the days to come and seeing if there's any hope in that search working any time soon. Creating the plain peptide .idx file should be the easiest step in the process so it's not good if that step is failing with lots of RAM still available. But regarding the simpler tryptic search, there might be some hope as it looks like you are mixing classic Comet and the index search.

The "Search start" and "Input file" output implies you're running classic Comet, searching a fasta file. That search is running out of memory attempting to load the mzML file. Setting the following parameter to "spectrum_batch_size = 50000" or "spectrum_batch_size = 100000" should allow that fasta search to run to completion.

To run the tryptic index search, you will need to specify searching against the ".idx" file you created. You can do this either in comet.params or using the "-D" command line option, for example
comet -D/home/data/Fasta/uniprot-proteome_UP000005640.target-protrev.fasta.idx /blob/dda/PXD017407/20180630_SKMEL5_P1uM_L7_.mzML

@kfattila
Copy link
Author

kfattila commented May 7, 2024

Hi Jimmy. Many thanks for your prompt response. Yea, I did try to run a standard search with a fasta file and with a mzML file. I am very sure about that it is not an out of memory issue. I think there are like 300 spectra in that file. You can download that spectra file and try to reproduced that error. Or I can send it to you tomorrow.

@jke000
Copy link
Collaborator

jke000 commented May 7, 2024

OK, point me to the mzML file. I ran a search using your params, a human target-decoy database, and some random spectral file I had that completed fine so maybe there's some quirk in the file itself that I need to look/account for.

@kfattila
Copy link
Author

kfattila commented May 8, 2024

Here is the data. Less than 1 MB.
20180630_SKMEL5_P1uM_L7_.zip

@jke000
Copy link
Collaborator

jke000 commented May 8, 2024

Attila, there's a parsing problem that's causing the run issue with that mzML file. Part of the problem stems from the lack of the optional scan index in the mzML which leads to other issues. The easiest short term work around is to pass your mzML through msconvert, have it rewrite a new mzML file, and use that file for the search. I'll leave this issue open to remind me to one day come back and poke through the parsing code to see if I can get Comet to run with this mzML.

msconvert --mzML --outfile new.mzML 20180630_SKMEL5_P1uM_L7_.mzML

@jke000
Copy link
Collaborator

jke000 commented May 9, 2024

And just to follow-up with the very first issue of the "-i" plain peptide/protein index building that failed, can you make that database and your comet.params available to me? I and the Schweppe lab have run a similar analysis here before (human target+decoy fasta, no enzyme constraint, 8 to 15 length limit , oxidized methionine variable mod). I just repeated the analysis using the 7 to 15 length limit and completed both the .idx creation and the search step on my linux box so there's some hope that it should work for you as well if I can replicate the memory allocation issue you're seeing.

@kfattila
Copy link
Author

I'll send you on Monday.

@kfattila
Copy link
Author

Hi, I've sent them by Slack.

@jke000
Copy link
Collaborator

jke000 commented May 13, 2024

Memory use is a problem and a huge search space just exasperates the issue. With your params and database, the .idx file creation failed for me on a node with 128GB RAM requested. Just before the process died, the memory use doubled. I then tried running on a node with 132GB RAM requested and that allowed the analysis to complete. I'll investigate that memory spike and work on doing more memory reduction as there's hope for a bit of savings. But until then, you'll need access to a box with more memory to run Comet's fragment ion indexing with these search parameters.

Here's the run using four cores on an Ubuntu node. That human.mzXML only has ~5K ms/ms spectra to search so it's a lot of expensive processing for a small amount of query spectra.

Screenshot 2024-05-13 111509

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants