Apple M1 vs x86_64 floating point differences. #3123

nrnhines · 2024-10-10T11:47:01Z

I observe occasional slight differences between FPU results between Mac M1 and linux x86_64 . Those differences sometime accumulate and cause CI failure for MACOS. This issue is blocking progress on #1960

Here is an example of a gaussian elimination difference during triangularization during the first time step of a 1 compartment HH model.

The instrumentation in nrn/src/nrnoc/solve.cpp static void triang(NrnThread* _nt) { was :

printf("R fegetround %d\n", fegetround());
printf("Q triang\n");
    for (i = i3 - 1; i >= i2; --i) {
        auto const p = vec_a[i] / vec_d[i];
printf("Q i=%d a=%.17g d=%.17g p=%.17g\n", i, vec_a[i], vec_d[i], p);
        auto const pi = _nt->_v_parent_index[i];
        vec_d[pi] -= p * vec_b[i];
printf("Q   d[%d]=%.17g b[%d]=%.17g\n", pi, vec_d[pi], i, vec_b[i]);
        vec_rhs[pi] -= p * vec_rhs[i];
printf("Q   rhs[%d]=%.17g rhs[%d]=%.17g\n", pi, vec_rhs[pi], i, vec_rhs[i]);
    }

Double precision result identity perhaps requires an additional set of comparison files created by Apple M1 runs. Alternatively, suggestions by chatgpt that might be explored are:

Consistent compiler flags: Make sure you're using the same floating-point-related compiler flags on both platforms. Avoid flags like -ffast-math that can introduce non-IEEE-compliant optimizations.

Recompile with stricter floating-point options: Use -fp-model precise (Intel) or equivalent options to enforce stricter IEEE 754 compliance across different architectures.

Use deterministic math libraries: If available, use math libraries designed to ensure deterministic results across platforms.

The text was updated successfully, but these errors were encountered:

nrnhines · 2024-10-10T12:23:03Z

I asked chatgpt what the clang equivalent would be for -fp-model-precise and got explanations for each of

clang -ffp-contract=off -fno-fast-math -frounding-math -fsignaling-nans -fno-strict-aliasing ...

It turns out that the above difference in d[0] goes away with just -ffp-contract=off.

By the way, with ninja, it is particularly simple to add such a flag in build.ninja by searching for the cpp file and in the build rule adding to FLAGS.

nrnhines · 2024-10-10T18:21:55Z

With Mac cmake build

cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=install -DCMAKE_BUILD_TYPE=Debug -DNRN_ENABLE_RX3D=OFF -DNRN_ENABLE_TESTS=ON  -DCMAKE_C_FLAGS="-ffp-contract=off" -DCMAKE_CXX_FLAGS="-ffp-contract=off"

ctest -j 8 fails only for

         16 - pytest_coreneuron::basic_tests_py3.12 (Failed)
         94 - parallel::nrntest_fast (Failed)

Returning to the current master (uses cvode version 2) and instrumenting test_nrntest_fast.py to print the test name and the max difference between a current run and the standard test_nrntest_fast.json we get (with -ffp-contract=off on the m1) for python -m pytest -s test_nrntest_fast.py

The instrumentation changes are:

diff --git a/test/pytest_coreneuron/test_nrntest_fast.py b/test/pytest_coreneuron/test_nrntest_fast.py
index fabf61978..4256f2544 100644
--- a/test/pytest_coreneuron/test_nrntest_fast.py
+++ b/test/pytest_coreneuron/test_nrntest_fast.py
@@ -49,10 +49,13 @@ def chk():
     dir_path = os.path.dirname(os.path.realpath(__file__))
     fname = "test_nrntest_fast.json"
     if True:
-        if h.CVode().version().split(".")[0] == "3":
-            fname = "test_nrntest_fast_cv3.json"
-            global cvtol
-            cvtol = cv3tol
+        try:
+            if h.CVode().version().split(".")[0] == "3":
+                fname = "test_nrntest_fast_cv3.json"
+                global cvtol
+                cvtol = cv3tol
+        except:
+            pass
 
     print(fname)
     checker = Chk(os.path.join(dir_path, fname))
@@ -170,6 +173,7 @@ def compare_time_and_voltage_trajectories(
     method = model_data["method"]  # cvode or fixed
 
     # Determine which data we will use as a reference
+    title = "%s %d %s %s"%(name, threads, field, method)
     if threads == 1:
         # threads=1: compare to reference from JSON file on disk
         key = name + ":"
@@ -178,6 +182,7 @@ def compare_time_and_voltage_trajectories(
         else:
             key += method
         ref_data = chk.get(key, None)
+
         if ref_data is None:
             # No comparison to be done; store the data as a new reference
             chk(key, model_data[threads])
@@ -216,14 +221,19 @@ def compare_time_and_voltage_trajectories(
         these_vals = this_data[name][field]
         ref_vals = ref_data[name][field]
         assert len(these_vals) == len(ref_vals)
+        i=-1
         for a, b in zip(these_vals, ref_vals):
-            match = math.isclose(a, b, rel_tol=tolerance)
+            i += 1
+            match = math.isclose(a, b, rel_tol=0.0)#tolerance)
             if match:
                 continue
             diff = abs(a - b) / max(abs(a), abs(b))
+#            print(name, i, a, b, diff)
             max_diff = max(diff, max_diff)
+    print(title, "max_diff = ", max_diff, tolerance)
     if max_diff > tolerance:
-        raise Exception("max diff {} > {}".format(max_diff, tolerance))
+        # raise Exception("max diff {} > {}".format(max_diff, tolerance))
+        pass

Given this observation, it seems reasonable to focus on the fixed step difference beween x86_64 and M1.

nrnhines · 2024-10-14T14:40:24Z

Just a note. On mac m1, I get identical results between default and -ffp-model=precise. Also I get identical results between -ffp-model=strict and -ffp-contract=off. This is consistent with the table at https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior

Here is the difference between precise and contract off

nrnhines · 2024-10-15T13:55:19Z

In addition to contract=off, using branch hines/digest-debug-3 along with h.use_exp_pow_precision(1), python3 -m pytest -s test_nrntest_fast.py fixes the fixed step issues. i.e.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apple M1 vs x86_64 floating point differences. #3123

Apple M1 vs x86_64 floating point differences. #3123

nrnhines commented Oct 10, 2024

nrnhines commented Oct 10, 2024

nrnhines commented Oct 10, 2024 •

edited

Loading

nrnhines commented Oct 14, 2024 •

edited

Loading

nrnhines commented Oct 15, 2024 •

edited

Loading

Apple M1 vs x86_64 floating point differences. #3123

Apple M1 vs x86_64 floating point differences. #3123

Comments

nrnhines commented Oct 10, 2024

nrnhines commented Oct 10, 2024

nrnhines commented Oct 10, 2024 • edited Loading

nrnhines commented Oct 14, 2024 • edited Loading

nrnhines commented Oct 15, 2024 • edited Loading

nrnhines commented Oct 10, 2024 •

edited

Loading

nrnhines commented Oct 14, 2024 •

edited

Loading

nrnhines commented Oct 15, 2024 •

edited

Loading