Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BBPBGLIB-1139] config / model error logging fix, stage 2 #146

Closed
wants to merge 2 commits into from

Conversation

atemerev
Copy link
Contributor

Context

On simulation launch in commands.py, exceptions were handled in the same way for configuration / model loading errors, and simulation errors. This required quirky synchronization to make sure that exceptions were logged only at a single MPI node, otherwise they flood the output.

The idea of this PR is to separate exception handling for configuration parsing / model loading (these errors are the same for all nodes, and supposed to be logged only at the master node), and simulation errors (can happen only at some runs, and can be different everywhere, and perhaps need to be logged at all nodes).

Scope

Separate exception handling at model loading stage and simulation run stage. Call _mpi_abort only in the latter case. In the former case, log errors only on the node with the MPI rank 0.

Testing

Again, I don't think it is feasible to write a unit test for this.

Review

  • PR description is complete
  • Coding style (imports, function length, New functions, classes or files) are good
  • [N/A] Unit/Scientific test added
  • Updated Readme, in-code, developer documentation

…nning. Remove the synchronization mechanism for error logging. Config/model errors are logged only on Rank 0; other exceptions can be different in different parts of the simulation and logged everywhere.
@atemerev atemerev changed the title Atemerev/separate config exception [BBPBGLIB-1139] config / model error logging fix, stage 2 Mar 21, 2024
@bbpbuildbot
Copy link

@WeinaJi
Copy link
Collaborator

WeinaJi commented Mar 22, 2024

Hi @atemerev , there is the exception type ConfigurationError which should be raised for errors during reading config files. This exception should be raised by all ranks, and to be caught properly. The errors during modelling, such as read and creation of cells and synapses are more complex. They may happen in some of the ranks but not all. An example what I can think of is loading the emodel hoc template in Cell_V6._instantiate_cell where the EModel files are from scientists and some of them may contain errors. As difference cells require different EModel templates, the EModel files load in each rank are not the same meaning that we may have errors in some of the ranks. In this case, we may not be able to log it at rank 0.

except Exception as e:
# at this stage, this is an error in the simulation itself, can happen in individual nodes
# it is OK to log it everywhere it happens
logging.critical(f"Unhandled exception. Terminating: {str(e)}", sys.exc_info())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use logging.exception, would that make sense?

@WeinaJi
Copy link
Collaborator

WeinaJi commented May 13, 2024

After the final decision addressed on BBPBGLIB-1139, we can close this PR.

@WeinaJi WeinaJi closed this May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants