Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

114 tests are failing on gpu machine #115

Merged
merged 12 commits into from
Dec 19, 2023

Conversation

jank324
Copy link
Member

@jank324 jank324 commented Dec 11, 2023

Description

This PR overhauls the way devices and types are handles by Elements and Beams. The idea is to make it more in line with how original PyTorch Modules like nn.Linear do it (https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear). This also addresses the fact that according to (https://stackoverflow.com/questions/58926054/how-to-get-the-device-type-of-a-pytorch-module-conveniently) Modules shouldn't have overarching type and device properties because their parameters and buffers may be on different devices and of different types.

On a more practical note, this makes the entire device system much more robust and basically completes MPS support from Cheetah's side (#61 / not all operations we use are implemented in PyTorch for MPS yet).

A minor downside is that we no longer do automatic device selection. This does, however, make that part of Cheetah more predictable (I previously had a lot of code using Cheetah fix the device because weird things would happen).

This PR also prepares an eventual fix for #113.

Motivation and Context

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have reformatted the code and checked that formatting passes (required).
  • I have have fixed all issues found by flake8 (required).
  • I have ensured that all pytest tests pass (required).
  • I have run pytest on a machine with a CUDA GPU and made sure all tests pass (required).
  • I have checked that the documentation builds (required).

Note: We are using a maximum length of 88 characters per line

@jank324 jank324 linked an issue Dec 11, 2023 that may be closed by this pull request
@jank324 jank324 self-assigned this Dec 11, 2023
@jank324 jank324 added the bug Something isn't working label Dec 11, 2023
@jank324 jank324 marked this pull request as ready for review December 11, 2023 19:30
@cr-xu
Copy link
Member

cr-xu commented Dec 12, 2023

If I read the changes correctly, now we can safely switch between [float16, float32, float64] for elements, right?

@jank324
Copy link
Member Author

jank324 commented Dec 12, 2023

I haven't tested it (other than quickly trying to track through a quadrupole with MPS, see #61 ), but in theory, we should now be able to safely switch between all dtypes and devices, yes.

@jank324 jank324 requested a review from cr-xu December 18, 2023 10:29
@jank324 jank324 linked an issue Dec 18, 2023 that may be closed by this pull request
cheetah/track_methods.py Show resolved Hide resolved
Comment on lines +7 to 8
- The handling of `device` and `dtype` was overhauled. They might not behave as expected. `Element`s also no longer have a `device` attribute. (see #115) (@jank324)

Copy link
Member

@cr-xu cr-xu Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth mentioning explicitly in the documentation that it actually changes the behaviour of element creation, i.e. one can now create elements conveniently as before v.6.0, without the requirement to wrap every parameter as tensor first

dipole = cheetah.Dipole(length=0.1, angle=0.1, dtype=torch.float32)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right, but I also think this should be tested before making that claim.

@cr-xu cr-xu merged commit 2549df9 into master Dec 19, 2023
9 checks passed
@jank324 jank324 deleted the 114-tests-are-failing-on-gpu-machine branch February 5, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tests are failing on GPU machine
2 participants