[BBPBGLIB-1139] config / model error logging fix, stage 2 #146

atemerev · 2024-03-21T13:52:12Z

Context

On simulation launch in commands.py, exceptions were handled in the same way for configuration / model loading errors, and simulation errors. This required quirky synchronization to make sure that exceptions were logged only at a single MPI node, otherwise they flood the output.

The idea of this PR is to separate exception handling for configuration parsing / model loading (these errors are the same for all nodes, and supposed to be logged only at the master node), and simulation errors (can happen only at some runs, and can be different everywhere, and perhaps need to be logged at all nodes).

Scope

Separate exception handling at model loading stage and simulation run stage. Call _mpi_abort only in the latter case. In the former case, log errors only on the node with the MPI rank 0.

Testing

Again, I don't think it is feasible to write a unit test for this.

Review

PR description is complete
Coding style (imports, function length, New functions, classes or files) are good
[N/A] Unit/Scientific test added
Updated Readme, in-code, developer documentation

…nning. Remove the synchronization mechanism for error logging. Config/model errors are logged only on Rank 0; other exceptions can be different in different parts of the simulation and logged everywhere.

bbpbuildbot · 2024-03-21T14:29:28Z

Logfiles from GitLab pipeline #201647 (:no_entry:) have been uploaded here!

Status and direct links:

WeinaJi · 2024-03-22T08:42:02Z

Hi @atemerev , there is the exception type ConfigurationError which should be raised for errors during reading config files. This exception should be raised by all ranks, and to be caught properly. The errors during modelling, such as read and creation of cells and synapses are more complex. They may happen in some of the ranks but not all. An example what I can think of is loading the emodel hoc template in Cell_V6._instantiate_cell where the EModel files are from scientists and some of them may contain errors. As difference cells require different EModel templates, the EModel files load in each rank are not the same meaning that we may have errors in some of the ranks. In this case, we may not be able to log it at rank 0.

matz-e · 2024-03-22T08:46:52Z

neurodamus/commands.py

+    except Exception as e:
+        # at this stage, this is an error in the simulation itself, can happen in individual nodes
+        # it is OK to log it everywhere it happens
+        logging.critical(f"Unhandled exception. Terminating: {str(e)}", sys.exc_info())


Can we use logging.exception, would that make sense?

WeinaJi · 2024-05-13T11:38:15Z

After the final decision addressed on BBPBGLIB-1139, we can close this PR.

atemerev added 2 commits March 21, 2024 14:40

Separate exception handling on config/model loading and simulation ru…

bf20c14

…nning. Remove the synchronization mechanism for error logging. Config/model errors are logged only on Rank 0; other exceptions can be different in different parts of the simulation and logged everywhere.

Typo in simulation_sonata.json

e5b7afd

atemerev changed the title ~~Atemerev/separate config exception~~ [BBPBGLIB-1139] config / model error logging fix, stage 2 Mar 21, 2024

pramodk requested review from jorblancoa and WeinaJi March 21, 2024 14:05

matz-e reviewed Mar 22, 2024

View reviewed changes

WeinaJi closed this May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BBPBGLIB-1139] config / model error logging fix, stage 2 #146

[BBPBGLIB-1139] config / model error logging fix, stage 2 #146

atemerev commented Mar 21, 2024

bbpbuildbot commented Mar 21, 2024

WeinaJi commented Mar 22, 2024

matz-e Mar 22, 2024

WeinaJi commented May 13, 2024

[BBPBGLIB-1139] config / model error logging fix, stage 2 #146

[BBPBGLIB-1139] config / model error logging fix, stage 2 #146

Conversation

atemerev commented Mar 21, 2024

Context

Scope

Testing

Review

bbpbuildbot commented Mar 21, 2024

WeinaJi commented Mar 22, 2024

matz-e Mar 22, 2024

Choose a reason for hiding this comment

WeinaJi commented May 13, 2024