114 tests are failing on gpu machine #115

jank324 · 2023-12-11T19:07:39Z

Description

This PR overhauls the way devices and types are handles by Elements and Beams. The idea is to make it more in line with how original PyTorch Modules like nn.Linear do it (https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear). This also addresses the fact that according to (https://stackoverflow.com/questions/58926054/how-to-get-the-device-type-of-a-pytorch-module-conveniently) Modules shouldn't have overarching type and device properties because their parameters and buffers may be on different devices and of different types.

On a more practical note, this makes the entire device system much more robust and basically completes MPS support from Cheetah's side (#61 / not all operations we use are implemented in PyTorch for MPS yet).

A minor downside is that we no longer do automatic device selection. This does, however, make that part of Cheetah more predictable (I previously had a lot of code using Cheetah fix the device because weird things would happen).

This PR also prepares an eventual fix for #113.

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code and checked that formatting passes (required).
I have have fixed all issues found by flake8 (required).
I have ensured that all pytest tests pass (required).
I have run pytest on a machine with a CUDA GPU and made sure all tests pass (required).
I have checked that the documentation builds (required).

Note: We are using a maximum length of 88 characters per line

cr-xu · 2023-12-12T10:19:29Z

If I read the changes correctly, now we can safely switch between [float16, float32, float64] for elements, right?

jank324 · 2023-12-12T10:21:30Z

I haven't tested it (other than quickly trying to track through a quadrupole with MPS, see #61 ), but in theory, we should now be able to safely switch between all dtypes and devices, yes.

cheetah/track_methods.py

cr-xu · 2023-12-18T11:04:14Z

CHANGELOG.md

+- The handling of `device` and `dtype` was overhauled. They might not behave as expected. `Element`s also no longer have a `device` attribute. (see #115) (@jank324)



I think it's worth mentioning explicitly in the documentation that it actually changes the behaviour of element creation, i.e. one can now create elements conveniently as before v.6.0, without the requirement to wrap every parameter as tensor first

dipole = cheetah.Dipole(length=0.1, angle=0.1, dtype=torch.float32)

I think you are right, but I also think this should be tested before making that claim.

jank324 and others added 8 commits December 11, 2023 16:20

Add test to find Ocelot conversion device issue

ae9ea59

Fix tests on CPU-only machine

ba905f4

Add test to check that torch to method works

8469d89

Fix formatting

7f4cbb5

Remove element device property

9e0db2e

Fix how types and devices are handled

cc206e0

Fix tests failing on GPU machine

ba5fcc6

Fix formatting

6805817

jank324 linked an issue Dec 11, 2023 that may be closed by this pull request

Tests are failing on GPU machine #114

Closed

Update changelog

4223bcf

jank324 self-assigned this Dec 11, 2023

jank324 added the bug Something isn't working label Dec 11, 2023

jank324 marked this pull request as ready for review December 11, 2023 19:30

jank324 added 2 commits December 11, 2023 20:53

Add convenient device and dtype args to Ocelot import

02b0469

Fix screen extent computation return type

ff9c409

jank324 requested a review from cr-xu December 18, 2023 10:29

jank324 linked an issue Dec 18, 2023 that may be closed by this pull request

Moving elements and beams to devices doesn't work #113

Closed

jank324 removed a link to an issue Dec 18, 2023

Moving elements and beams to devices doesn't work #113

Closed

Add test for float64 tracking; Remove some large rtol in tests

cc763d7

cr-xu reviewed Dec 18, 2023

View reviewed changes

cr-xu merged commit 2549df9 into master Dec 19, 2023
9 checks passed

jank324 deleted the 114-tests-are-failing-on-gpu-machine branch February 5, 2024 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

114 tests are failing on gpu machine #115

114 tests are failing on gpu machine #115

jank324 commented Dec 11, 2023 •

edited

Loading

cr-xu commented Dec 12, 2023

jank324 commented Dec 12, 2023 •

edited

Loading

cr-xu Dec 18, 2023 •

edited

Loading

jank324 Dec 18, 2023

		- The handling of `device` and `dtype` was overhauled. They might not behave as expected. `Element`s also no longer have a `device` attribute. (see #115) (@jank324)

114 tests are failing on gpu machine #115

114 tests are failing on gpu machine #115

Conversation

jank324 commented Dec 11, 2023 • edited Loading

Description

Motivation and Context

Types of changes

Checklist

cr-xu commented Dec 12, 2023

jank324 commented Dec 12, 2023 • edited Loading

cr-xu Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

jank324 Dec 18, 2023

Choose a reason for hiding this comment

jank324 commented Dec 11, 2023 •

edited

Loading

jank324 commented Dec 12, 2023 •

edited

Loading

cr-xu Dec 18, 2023 •

edited

Loading