Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantum caclulations crash with 'unknown error' #186

Open
simonlichtinger opened this issue Jun 9, 2022 · 6 comments
Open

Quantum caclulations crash with 'unknown error' #186

simonlichtinger opened this issue Jun 9, 2022 · 6 comments
Labels
bug Something isn't working dependencies Pull requests that update a dependency file

Comments

@simonlichtinger
Copy link

Dear bespoke fit devs,

Setup and how to reproduce

Ubuntu 18.04 LTS
Installation was the working consensus out of #183 , where psi4 version 1.6 was installed from the psi4 channel instead of conda-forge.

Input molecule file:
ala-ala.sdf.zip

image

Running command:

openff-bespoke executor run --file   "ala-ala.sdf"   --workflow  "default"  --n-fragmenter-workers 2  --n-optimizer-workers  2  --n-qc-compute-workers 4 

What happens

The software reaches the qc-generation step and spends several hours with this on 12 cores. Then the following error happens:

────────────────────────────────────────────────────────── OpenFF Bespoke ──────────────────────────────────────────────────────────

[✓] bespoke executor launched

1. preparing the bespoke workflow                                                                                                   
                                                                                                                                    
[✓] 1 molecules found
[✓] fitting schemas generated
                                                                                                                                    
2. submitting the workflow                                                                                                          
                                                                                                                                    
[✓] the following workflows were submitted
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━┓
┃ ID ┃ SMILES                               ┃ NAME ┃ FILE        ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━┩
│ 1  │ C[C@@H](C(=O)N[C@@H](C)C(=O)O)[NH3+] │ LIG  │ ala-ala.sdf │
└────┴──────────────────────────────────────┴──────┴─────────────┘
                                                                                                                                    
3. running the fitting pipeline                                                                                                     
                                                                                                                                    
⠇ fragmenting the moleculeWarning: Cannot perform Hydrogen sampling with GPU-Omega: GPU-Omega disabled.
[✓] fragmentation successful
⠼ generating bespoke QC dataWarning: Cannot perform Hydrogen sampling with GPU-Omega: GPU-Omega disabled.
Warning: Cannot perform Hydrogen sampling with GPU-Omega: GPU-Omega disabled.
Warning: Cannot perform Hydrogen sampling with GPU-Omega: GPU-Omega disabled.
Warning: Cannot perform Hydrogen sampling with GPU-Omega: GPU-Omega disabled.
[x] qc-generation failed
                                                                                                                                    
 [{"type": "ValueError", "message": "TorsionDrive error at -15:\ngeomeTRIC run_json error:\nTraceback (most recent call last):\n    
 File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/run_json.py\", line 225, in         
 geometric_run_json\n    geometric.optimize.Optimize(coords, M, IC, engine, None, params)\n  File                                   
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/optimize.py\", line 1331, in Optimize\n  
 return optimizer.optimizeGeometry()\n  File                                                                                        
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/optimize.py\", line 1298, in             
 optimizeGeometry\n    self.calcEnergyForce()\n  File                                                                               
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/optimize.py\", line 1002, in             
 calcEnergyForce\n    spcalc = self.engine.calc(self.X, self.dirname)\n  File                                                       
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/engine.py\", line 873, in calc\n         
 return self.calc_new(coords, dirname)\n  File                                                                                      
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/engine.py\", line 865, in calc_new\n     
 raise QCEngineAPIEngineError(\"QCEngineAPI computation did not execute correctly. Message: \" +                                    
 ret[\"error\"][\"error_message\"])\ngeometric.errors.QCEngineAPIEngineError: QCEngineAPI computation did not execute correctly.    
 Message: QCEngine Unknown Error: Unknown error, error message is not found\n", "traceback": "Traceback (most recent call last):\n  
 File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/celery/app/trace.py\", line 451, in           
 trace_task\n    R = retval = fun(*args, **kwargs)\n  File                                                                          
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/celery/app/trace.py\", line 734, in                
 __protected_call__\n    return self.run(*args, **kwargs)\n  File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/
 site-packages/openff/bespokefit/executor/services/qcgenerator/worker.py\", line 132, in compute_torsion_drive\n    return_value =  
 qcengine.compute_procedure(\n  File                                                                                                
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/qcengine/compute.py\", line 149, in                
 compute_procedure\n    return handle_output_metadata(output_data, metadata, raise_error=raise_error, return_dict=return_dict)\n    
 File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/qcengine/util.py\", line 177, in              
 handle_output_metadata\n    raise ValueError(output_fusion[\"error\"][\"error_message\"])\nValueError: TorsionDrive error at       
 -15:\ngeomeTRIC run_json error:\nTraceback (most recent call last):\n  File                                                        
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/run_json.py\", line 225, in              
 geometric_run_json\n    geometric.optimize.Optimize(coords, M, IC, engine, None, params)\n  File                                   
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/optimize.py\", line 1331, in Optimize\n  
 return optimizer.optimizeGeometry()\n  File                                                                                        
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/optimize.py\", line 1298, in             
 optimizeGeometry\n    self.calcEnergyForce()\n  File                                                                               
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/optimize.py\", line 1002, in             
 calcEnergyForce\n    spcalc = self.engine.calc(self.X, self.dirname)\n  File                                                       
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/engine.py\", line 873, in calc\n         
 return self.calc_new(coords, dirname)\n  File                                                                                      
 \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/geometric/engine.py\", line 865, in calc_new\n     
 raise QCEngineAPIEngineError(\"QCEngineAPI computation did not execute correctly. Message: \" +                                    
 ret[\"error\"][\"error_message\"])\ngeometric.errors.QCEngineAPIEngineError: QCEngineAPI computation did not execute correctly.    
 Message: QCEngine Unknown Error: Unknown error, error message is not found\n\n"}, null, null, null]                                
                                                                                                                                    
outputs have been saved to output.json                                                                                              
                                                                                                                                    

worker: Warm shutdown (MainProcess)

worker: Warm shutdown (MainProcess)

worker: Warm shutdown (MainProcess)

The created output file is:
output.zip

What else I've tried

  • Running this again: same error

Nothing else. Because bespoke fit has such little verbosity, I've not been able to see where this comes from or at what stage it happens.

Further note

I have previously noted that in QM geometry optimisations, this ligand sometimes behaves weird, at the level of B3LYP (but not HF), in that protons change location and the bonding pattern changes. I've previously attributed that to the fact that a dipeptide with charged termini might not be stable in vacuum, and an implicit solvent has fixed this. However, I have no way of telling whether this is the issue here because no intermediate files seem to be saved and since implicit solvent is not supported to my knowledge(?) I have not been able to try this out.

Many thanks
Simon

@jthorton
Copy link
Contributor

jthorton commented Jun 9, 2022

Hi @simonlichtinger thanks for the report, unfortunately, this has been an issue for a while now with the QCEngine interface to Psi4 in that some failure modes result in no useful error report our original issue is here and is still not resolved. Although if this case is a consistent source of failure it may help to improve the error reporting from psi4 and I will look into it. In the meantime, you may wish to change the level of theory to HF by creating and editing a fitting workflow and using that with BespokeFit?

implicit solvent is not supported to my knowledge(?)

Yes currently we don't support using an implicit solvent for the QM data generation but this should be straightforward to add as the QCEngine/Psi4 interface does support this its just a case of passing the settings via the interface and its something we could look at adding if it would be of use to you?

@simonlichtinger
Copy link
Author

Thanks @jthorton . I will try running with HF and implicit solvent then. Since this takes hours to fail (if it does), and I feel a little bit shaky on understanding the docs right, could you please confirm that these workflows are appropriate for ...

... using the HF method:

factory = BespokeWorkflowFactory()
ala_ala = Molecule.from_file("ala-ala.pdb")
qcspec = QCSpec(method='HF',basis='6-31G*',program='psi4')
factory.default_qc_specs = [qcspec]
workflow = factory.optimization_schema_from_molecule(ala_ala)

... using an implicit solvent (if this does indeed what I think it would do? Or is something else needed to make this happen? If so, that would certainly be a very useful feature for anyone with charged / highly dipolar ligands!)

factory = BespokeWorkflowFactory()
ala_ala = Molecule.from_file("ala-ala.pdb")
qcspec = QCSpec(method='B3LYP-D3BJ',basis='DZVP',program='psi4', implicit_solvent=PCMSettings(units='au', medium_Solvent='water'))
factory.default_qc_specs = [qcspec]
workflow = factory.optimization_schema_from_molecule(ala_ala)

Thanks,
Simon

@jthorton
Copy link
Contributor

@simonlichtinger the HF workflow looks correct, the implicit solvent one while correct won't work as we don't pass on the implicit solvent settings and this will require some changes to BespokeFit to get working which we can look at adding.

@simonlichtinger
Copy link
Author

So I've tried around with this now.

Unfortunately, running the following python script:

from openff.bespokefit.workflows import BespokeWorkflowFactory
from openff.toolkit.topology import Molecule
from openff.qcsubmit.common_structures import QCSpec, PCMSettings
from openff.bespokefit.executor import BespokeExecutor, BespokeWorkerConfig
from openff.bespokefit.executor import wait_until_complete



factory = BespokeWorkflowFactory()
ala_ala = Molecule.from_file("ala-ala.pdb")
qcspec = QCSpec(method='HF',basis='6-31G*',program='psi4')
factory.default_qc_specs = [qcspec]
workflow = factory.optimization_schema_from_molecule(ala_ala)

executor = BespokeExecutor(
    n_fragmenter_workers=12,
    fragmenter_worker_config=BespokeWorkerConfig(n_cores=1),
    n_qc_compute_workers=6,
    qc_compute_worker_config=BespokeWorkerConfig(n_cores='auto'),
    n_optimizer_workers=6,
    optimizer_worker_config=BespokeWorkerConfig(n_cores=2),
)

with executor:
    task_id = BespokeExecutor.submit(workflow)
    output = wait_until_complete(task_id)
if output.status == "success":
    output.bespoke_force_field.to_file("HF_ff.offxml")
elif output.status == "errored":
    print(output.error)

led to this error:

ValueError: TorsionDrive error at -30:\ngeomeTRIC run_json error:\nTraceback (most recent call last):\n  File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit/lib/python3.9/site-packages/geometric/run_json.py\", line 225, in geometric_run_json\n    geometric.optimize.Optimize(coords, M, IC, engine, None, params)\n  File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit/lib/python3.9/site-packages/geometric/optimize.py\", line 1331, in Optimize\n    return optimizer.optimizeGeometry()\n  File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit/lib/python3.9/site-packages/geometric/optimize.py\", line 1293, in optimizeGeometry\n    self.calcEnergyForce()\n  File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit/lib/python3.9/site-packages/geometric/optimize.py\", line 1002, in calcEnergyForce\n    spcalc = self.engine.calc(self.X, self.dirname)\n  File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit/lib/python3.9/site-packages/geometric/engine.py\", line 873, in calc\n    return self.calc_new(coords, dirname)\n  File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit/lib/python3.9/site-packages/geometric/engine.py\", line 865, in calc_new\n    raise QCEngineAPIEngineError(\"QCEngineAPI computation did not execute correctly. Message: \" + ret[\"error\"][\"error_message\"])\ngeometric.errors.QCEngineAPIEngineError: QCEngineAPI computation did not execute correctly. Message: QCEngine Execution Error:\nTraceback (most recent call last):\n  File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit/lib/python3.9/site-packages/qcengine/util.py\", line 114, in compute_wrapper\n    yield metadata\n  File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit/lib/python3.9/site-packages/qcengine/compute.py\", line 91, in compute\n    output_data = executor.compute(input_data, config)\n  File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit/lib/python3.9/site-packages/qcengine/programs/psi4.py\", line 121, in compute\n    pversion = parse_version(self.get_version())\n  File \"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit/lib/python3.9/site-packages/qcengine/programs/psi4.py\", line 91, in get_version\n    self.version_cache[which_prog] = safe_version(exc[\"stdout\"].split()[-1])\nIndexError: list index out of range

The command line openff-bespoke executor run --file "ala-ala.sdf" --workflow "default" --n-fragmenter-workers 2 --n-optimizer-workers 2 --n-qc-compute-workers 3 --qc-compute-n-cores 2 --default-qc-spec psi4 HF 6-31G* did run, however AFTER the qm calculations it crashed:

{"smiles": "[H:1][C@@:2]([C:3](=[O:4])[N:5]([H:6])[C@:7]([H:8])([C:9](=[O:10])[O:11][H:12])[C:13]([H:14])([H:15])[H:16])([C:17]([H:18])([H:19])[H:20])[N+:21]([H:22])([H:23])[H:24]", "stages": [{"type": "fragmentation", "status": "success", "error": "null"}, {"type": "qc-generation", "status": "success", "error": "[null, null, null, null]"}, {"type": "optimization", "status": "errored", "error": "{\"type\": \"RuntimeError\", \"message\": \"ConvergenceFailure: The optimization failed to converge.\\nNone\", \"traceback\": \"Traceback (most recent call last):\\n  File \\\"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/celery/app/trace.py\\\", line 451, in trace_task\\n    R = retval = fun(*args, **kwargs)\\n  File \\\"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/celery/app/trace.py\\\", line 734, in __protected_call__\\n    return self.run(*args, **kwargs)\\n  File \\\"/biggin/b197/sjoh5255/anaconda3/envs/bespokefit_2/lib/python3.9/site-packages/openff/bespokefit/executor/services/optimizer/worker.py\\\", line 74, in optimize\\n    raise (\\nRuntimeError: ConvergenceFailure: The optimization failed to converge.\\nNone\\n\"}"}], "results": null}

So there seems to be a problem with the fitting? Is there any way of retrying the fitting btw without re-doing all of the quantum calculations?

As for implicit solvent, that would be a very useful feature, please do implement it if you have a chance.

@jthorton
Copy link
Contributor

Strange it looks like the script ran into an error when trying to get the version from psi4, this could possibly be due to an oversubscription of resources, as the qc compute workers setting n_cores with auto gives them all access to every core which can lead to issues which are probably fixed in your CLI run with the option --qc-compute-n-cores 2, but glad the QM worked in the end!

To save the progress you can add the --directory option to the command which is the location a database will be saved to meaning you can resubmit jobs and reuse cached data. If the optimisation did not converge with the default settings then you can edit them, the available ones are listed here, when building a workflow schema you will then need to save the file to json and supply it in the CLI under the option --workflow-file. The two options I would suggest to try and change are n_criteria and max_iterations, I hope this helps!

@jthorton jthorton added bug Something isn't working dependencies Pull requests that update a dependency file labels Jun 14, 2022
@simonlichtinger
Copy link
Author

Thanks for the suggestion. It seems that indeed the 'auto' setting in the python script is what triggers the bug, it appears to run if I set a number of cores.

With the CLI, I was now able to do a fit and obtain a forcefield, I will open an separate issue in due course about the interpretation of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dependencies Pull requests that update a dependency file
Projects
None yet
Development

No branches or pull requests

2 participants