Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix indenting to prevent repeated calculation in check_positive_jacobian() #1053

Merged
merged 2 commits into from
Jun 13, 2024

Conversation

YigitElma
Copy link
Collaborator

No description provided.

@YigitElma YigitElma self-assigned this Jun 13, 2024
@YigitElma YigitElma added easy Short and simple to code or review EZ-review labels Jun 13, 2024
Copy link

codecov bot commented Jun 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.96%. Comparing base (c0ade01) to head (fb7545a).
Report is 1871 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1053   +/-   ##
=======================================
  Coverage   94.96%   94.96%           
=======================================
  Files          87       87           
  Lines       21776    21776           
=======================================
  Hits        20679    20679           
  Misses       1097     1097           
Files with missing lines Coverage Δ
desc/compat.py 83.96% <100.00%> (ø)

Copy link
Contributor

|             benchmark_name             |         dt(%)          |         dt(s)          |        t_new(s)        |        t_old(s)        | 
| -------------------------------------- | ---------------------- | ---------------------- | ---------------------- | ---------------------- |
 test_build_transform_fft_lowres         |     -1.34 +/- 6.72     | -7.83e-03 +/- 3.94e-02 |  5.78e-01 +/- 3.2e-02  |  5.86e-01 +/- 2.3e-02  |
 test_build_transform_fft_midres         |     -2.90 +/- 1.69     | -1.94e-02 +/- 1.13e-02 |  6.51e-01 +/- 7.7e-03  |  6.71e-01 +/- 8.3e-03  |
 test_build_transform_fft_highres        |     -0.69 +/- 2.61     | -7.30e-03 +/- 2.75e-02 |  1.05e+00 +/- 1.1e-02  |  1.05e+00 +/- 2.5e-02  |
 test_equilibrium_init_lowres            |     -5.27 +/- 6.09     | -2.19e-01 +/- 2.53e-01 |  3.94e+00 +/- 2.1e-01  |  4.16e+00 +/- 1.5e-01  |
 test_equilibrium_init_medres            |     -1.54 +/- 5.36     | -7.06e-02 +/- 2.46e-01 |  4.51e+00 +/- 1.1e-01  |  4.59e+00 +/- 2.2e-01  |
 test_equilibrium_init_highres           |     -3.68 +/- 3.19     | -2.36e-01 +/- 2.04e-01 |  6.17e+00 +/- 1.5e-01  |  6.41e+00 +/- 1.4e-01  |
 test_objective_compile_dshape_current   |     +1.19 +/- 3.77     | +4.81e-02 +/- 1.52e-01 |  4.08e+00 +/- 4.4e-02  |  4.03e+00 +/- 1.5e-01  |
 test_objective_compile_atf              |     -1.04 +/- 5.00     | -9.09e-02 +/- 4.36e-01 |  8.62e+00 +/- 2.6e-01  |  8.71e+00 +/- 3.5e-01  |
 test_objective_compute_dshape_current   |     +1.76 +/- 5.34     | +2.10e-05 +/- 6.38e-05 |  1.21e-03 +/- 4.6e-05  |  1.19e-03 +/- 4.4e-05  |
 test_objective_compute_atf              |     -1.35 +/- 5.66     | -5.79e-05 +/- 2.42e-04 |  4.22e-03 +/- 2.1e-04  |  4.27e-03 +/- 1.2e-04  |
 test_objective_jac_dshape_current       |     -5.59 +/- 11.91    | -2.27e-03 +/- 4.83e-03 |  3.82e-02 +/- 1.6e-03  |  4.05e-02 +/- 4.5e-03  |
 test_objective_jac_atf                  |     +1.61 +/- 3.15     | +3.09e-02 +/- 6.03e-02 |  1.94e+00 +/- 4.4e-02  |  1.91e+00 +/- 4.1e-02  |
 test_perturb_1                          |     -2.68 +/- 3.45     | -3.77e-01 +/- 4.85e-01 |  1.37e+01 +/- 4.1e-01  |  1.41e+01 +/- 2.6e-01  |
 test_perturb_2                          |     -0.91 +/- 4.56     | -1.72e-01 +/- 8.66e-01 |  1.88e+01 +/- 5.9e-01  |  1.90e+01 +/- 6.3e-01  |
 test_proximal_jac_atf                   |     +0.36 +/- 1.34     | +2.69e-02 +/- 9.98e-02 |  7.46e+00 +/- 5.9e-02  |  7.43e+00 +/- 8.0e-02  |
 test_proximal_freeb_compute             |     +1.55 +/- 1.18     | +2.76e-03 +/- 2.10e-03 |  1.80e-01 +/- 1.5e-03  |  1.78e-01 +/- 1.5e-03  |
 test_proximal_freeb_jac                 |     -2.19 +/- 1.36     | -1.67e-01 +/- 1.04e-01 |  7.47e+00 +/- 9.2e-02  |  7.63e+00 +/- 4.7e-02  |

@YigitElma YigitElma merged commit 0a8ec62 into master Jun 13, 2024
18 checks passed
@YigitElma YigitElma deleted the yge/hotfix2 branch June 13, 2024 23:38
@YigitElma
Copy link
Collaborator Author

YigitElma commented Jun 13, 2024

I merged this branch (since it was clearly a bug, independent of the performance improvement), but the benchmarks caused a question in my mind about their validity. For equilibrium initialization benchmarks, it says that there is no change basically and even the highest equilibrium init takes 6 seconds. I think this is due to the benchmarks creating some very basic surface (the default one when nothing is given to FourierRZToroidalSurface) and this doesn't represent most of the actual cases. For comparison, I was doing GPU profiling and resolution 15 equilibrium takes consistently 25 seconds with this surface

surface_2D = FourierRZToroidalSurface(
        R_lmn=np.array([10, -1]),  # boundary coefficients                                                                                                                          
        Z_lmn=np.array([1]),
        modes_R=np.array([[0, 0], [1, 0]]),  # [M, N] boundary Fourier modes                                                                                                        
        modes_Z=np.array([[-1, 0]]),
        NFP=5,  # number of (toroidal) field periods (does not matter for 2D, but will for 3D solution)   

This is also very basic actually. Is this again some CPU/GPU difference issue? I was measuring at least 20% improvement on LMN=15 from 25 seconds to 20 seconds. Any thoughts @f0uriest @dpanici ?

@f0uriest
Copy link
Member

My hunch would be cpu/gpu difference, but im surprised its that much. You can try the regular line profiler on cpu and see where the differences are

@YigitElma
Copy link
Collaborator Author

YigitElma commented Jun 14, 2024

Yes, I can do that tomorrow. Also, I run the test script on my small laptop (the one we use for GPU), it takes only 13 seconds and 9-10 seconds when functions are jitted. This is probably again some for loop being too slow on GPU issue.

@f0uriest
Copy link
Member

certainly a possibility. If that's the case we should see if there's some small fixes we can do to make it faster on gpu

YigitElma added a commit that referenced this pull request Jul 24, 2024
Some of the initialization function use a lot of for loop that are
extremely slow on GPUs. If we move the execution to CPU, we get at least
x5 performance for equilibrium and geometry classes. Since the Github
actions are run on CPU, we won't see any speed up, but on GPU
interactive sessions, this should speed up plotting etc.

Add the 
```python
with jax.default_device(jax.devices("cpu")[0]):
```
for the parts you believe don't need to be run on GPUs @f0uriest
@dpanici @rahulgaur104 @ddudt @kianorr @unalmis

This was actually noticed on previous PR #1053 when the benchmark didn't
result in speed up, but the GPU eq init got faster by 5 seconds.

![image
(1)](https://github.com/PlasmaControl/DESC/assets/102380275/372c8f2a-07bf-4bc3-8238-1c1d91ea1ae1)

![image](https://github.com/PlasmaControl/DESC/assets/102380275/02badace-af44-4801-ae5d-3707f52aaa5d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy Short and simple to code or review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants