Ocean fails stand-alone decomp test, intel optimized #5219

mark-petersen · 2022-10-06T17:29:36Z

MPAS-Ocean nightly on master 4dcc8db fails

ocean/baroclinic_channel/10km/decomp_test
ocean/global_ocean/QU240/PHC/RK4/decomp_test

with the intel optimized compiler and OpenMP. Differences between 4 processor and 8 processor runs are max 1e-13.

The text was updated successfully, but these errors were encountered:

mark-petersen · 2022-10-06T17:36:01Z

Earlier this week, I thought it failed

ocean/global_ocean/QU240/PHC/decomp_test

with intel optimized, but I don't see that on badger with 4dcc8db now - maybe I'm mixing it up though.

mark-petersen · 2022-10-06T18:16:22Z

tested the very first SMEP merge: bb84429 Merge branch 'vanroekel/ocean/add-submesoscale-eddies' (PR #5099)

PASS ocean/baroclinic_channel/10km/decomp_test
FAIL ocean/global_ocean/QU240/PHC/decomp_test
PASS ocean/global_ocean/QU240/PHC/RK4/decomp_test

with intel optimized on badger. This is confusing.

xylar · 2022-10-06T22:20:37Z

I'm trying to use bisection to find the cause of this and I'm just seeing hanging when I try to test on c63cce2. This may have more to do with a bad node on Anvil or something but it's certainly not helping me debug...

xylar · 2022-10-07T20:46:19Z

@mark-petersen, I agree that #5099 is responsible and that this is probably already fixed in #5216. I'll make sure.

xylar · 2022-10-07T21:18:14Z

Nope, the test is still failing after #5216 so there's another threading problem from #5099 that we need to track down.

xylar · 2022-10-10T17:31:24Z

Sorry, I wasn't clear in my mind. We're looking for a decomposition problem, not a threading problem. Much trickier in some ways!

xylar · 2022-11-29T15:57:26Z

I believe this was introduced by #5183 and not by #5099. At least that is what I'm seeing in testing on Chrysalis with Intel and Intel-MPI. I'm seeing test execution passing for #5170 (the previous ocean-related commit merge) but failing for #5183. No PRs were merged between these 2 so it seems like #5183 is likely responsible, though why is not at all clear at this point.

xylar · 2022-11-30T13:24:49Z

After rerunning pr test suite today, this test case is running fine for #5183 so the failures yesterday seem random and unrelated to this issue. Still investigating.

xylar · 2022-11-30T14:07:36Z

@mark-petersen and @dengwirda, I now believe this issue was introduced by #5195. While I ran into sporadic execution failures of ocean/baroclinic_channel/10km/decomp_test before that PR, I find consistent comparison (4 vs. 8 processor) failures after it. In my testing, the ocean/baroclinic_channel/10km/decomp_test still passes for #5172 but fails starting at #5195. I have not tested #5178 and #5182 that were merged between these 2 but I think they are highly unlikely to have caused this because they don't involve any standalone MPAS code.

I'm not at all confident about this but the most likely culprit to my eyes is this new loop:

E3SM/components/mpas-ocean/src/mode_forward/mpas_ocn_time_integration_split.F

Lines 798 to 813 in 21ffb4d

    
           !$omp parallel 
        
           !$omp do schedule(runtime) & 
        
           !$omp private(cell1, cell2, k, thicknessSum) 
        
           do iEdge = nEdgesOwned+1, nEdgesArray(4) 
        
              cell1 = cellsOnEdge(1,iEdge) 
        
              cell2 = cellsOnEdge(2,iEdge) 
        
              thicknessSum = layerThickEdgeFlux(minLevelEdgeBot(iEdge),iEdge) 
        
              do k = minLevelEdgeBot(iEdge)+1, maxLevelEdgeTop(iEdge) 
        
                 thicknessSum = thicknessSum + & 
        
                                layerThickEdgeFlux(k,iEdge) 
        
              enddo 
        
              bottomDepthEdge(iEdge) = thicknessSum & 
        
                 - 0.5_RKIND*(sshNew(cell1) + sshNew(cell2)) 
        
           enddo ! iEdge 
        
           !$omp end do 
        
           !$omp end parallel

It seems like maybe the OpenMP directives may not cover all the variables they need to? Some were fixed in #5226 but maybe some are still missing?

I'm still investigating.

xylar · 2022-11-30T15:23:40Z

Another possibility is this line:

E3SM/components/mpas-ocean/src/mode_forward/mpas_ocn_time_integration_split.F

Line 801 in 8a6f77d

do iEdge = nEdgesOwned+1, nEdgesArray(4)

It could be that 4 is out of range for nEdgesArray. If so, this would not be the only place that it is indexed out of bounds, the split implicit solver also indexes to config_num_halos + 1, which defaults to 4. I couldn't find any other code that indexes to this halo so it could be that it isn't guaranteed to exist and doesn't exist for some reason (e.g. small mesh size?) in the baroclinic channel test case.

xylar · 2022-11-30T16:22:02Z

I'm going to quickly try rerunning that test case with an index of nEdgesArray(3) instead of nEdgesArray(4) to see if it passes. Then, we can figure out what the deal is.

xylar · 2022-11-30T19:28:02Z

Still fails with nEdgesArray(3) so that's not it.

mark-petersen · 2022-12-02T19:13:30Z

Well, I spent some time on this and did not figure it out. But it appears that layerThickEdgeFlux has a machine-precision mismatch between processors when using intel. If I set this here:

components/mpas-ocean/src/mode_forward/mpas_ocn_time_integration_split.F
 739 layerThickEdgeFlux(:,1:nEdgesAll) = 50.0_RKIND

then I get a decomp test match for the baroclinic channel. This is not a solution, of course, because it overwrites the actual values in the array. But I tried some other things, like an extra halo update and rounding that array, but those didn't fix it.

mark-petersen · 2022-12-06T20:30:08Z

I finally found it. The computation of bottomDepthEdge was separated into two loops: first from 1:nEdgesOwned (with many other calculations) and another from nEdgesOwned+1:nEdgesArray(4) to finish up the halo. Apparently the intel optimized compiler changes the order of operations by itself. Moving the calculation of bottomDepthEdge completely to the second loop, and looping over all edges passes ocean/baroclinic_channel/10km/decomp_test:

+++ b/components/mpas-ocean/src/mode_forward/mpas_ocn_time_integration_split.F
@@ -775,8 +775,6 @@ module ocn_time_integration_split
-               bottomDepthEdge(iEdge) = thicknessSum &
-                  - 0.5_RKIND*(sshNew(cell1) + sshNew(cell2))

@@ -798,7 +798,7 @@ module ocn_time_integration_split
-            do iEdge = nEdgesOwned+1, nEdgesArray(4)
+            do iEdge = 1, nEdgesArray(4)

Whew!

On a side note, The 4 ignores the number of halo layers, which can be set in the namelist. I'll actually change it to

            do iEdge = 1, nEdgesArray(size(nEdgesArray)-1)

which includes all edges within the halo, but not the outside edges of the last halo layer.

xylar · 2022-12-06T20:44:45Z

Very nice detective work, @mark-petersen! I agree that we should not hard-code the halo size so I'm very happy with your recommended solution.

…5356) Move bottomDepthEdge calculation to single loop over all edges After #5195 was merged, the MPAS-Ocean standalone test ocean/baroclinic_channel/10km/decomp_test failed to match between 4 and 8 partitions, but only for intel optimized. All compass nightly suite tests passed for gnu debug, gnu optimized, intel debug. This PR solves the problem by merging the computation of bottomDepthEdge into a single edge loop. Previously it was split into two loops, 1:nEdgesOwned (with many other calculations) and another from nEdgesOwned+1:nEdgesArray(4). The intel optimized compiler must have changed order-of-operations in these two loops for different partitions. Fixes #5219 [BFB]

mark-petersen added the mpas-ocean label Oct 6, 2022

mark-petersen self-assigned this Oct 6, 2022

xylar mentioned this issue Oct 6, 2022

Add TimeSeriesStatsRestartTest to global ocean MPAS-Dev/compass#435

Closed

xylar mentioned this issue Nov 19, 2022

Update E3SM-Project submodule MPAS-Dev/compass#461

Merged

32 tasks

xylar assigned dengwirda Nov 30, 2022

mark-petersen added the bug label Dec 7, 2022

mark-petersen mentioned this issue Dec 7, 2022

Move bottomDepthEdge calculation to single loop over all edges #5356

Merged

jonbob closed this as completed in 0614e7b Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ocean fails stand-alone decomp test, intel optimized #5219

Ocean fails stand-alone decomp test, intel optimized #5219

mark-petersen commented Oct 6, 2022

mark-petersen commented Oct 6, 2022

mark-petersen commented Oct 6, 2022

xylar commented Oct 6, 2022

xylar commented Oct 7, 2022

xylar commented Oct 7, 2022

xylar commented Oct 10, 2022

xylar commented Nov 29, 2022 •

edited

Loading

xylar commented Nov 30, 2022 •

edited

Loading

xylar commented Nov 30, 2022 •

edited

Loading

xylar commented Nov 30, 2022

xylar commented Nov 30, 2022

xylar commented Nov 30, 2022

mark-petersen commented Dec 2, 2022

mark-petersen commented Dec 6, 2022

xylar commented Dec 6, 2022

Ocean fails stand-alone decomp test, intel optimized #5219

Ocean fails stand-alone decomp test, intel optimized #5219

Comments

mark-petersen commented Oct 6, 2022

mark-petersen commented Oct 6, 2022

mark-petersen commented Oct 6, 2022

xylar commented Oct 6, 2022

xylar commented Oct 7, 2022

xylar commented Oct 7, 2022

xylar commented Oct 10, 2022

xylar commented Nov 29, 2022 • edited Loading

xylar commented Nov 30, 2022 • edited Loading

xylar commented Nov 30, 2022 • edited Loading

xylar commented Nov 30, 2022

xylar commented Nov 30, 2022

xylar commented Nov 30, 2022

mark-petersen commented Dec 2, 2022

mark-petersen commented Dec 6, 2022

xylar commented Dec 6, 2022

xylar commented Nov 29, 2022 •

edited

Loading

xylar commented Nov 30, 2022 •

edited

Loading

xylar commented Nov 30, 2022 •

edited

Loading