Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apple M1 vs x86_64 floating point differences. #3123

Open
nrnhines opened this issue Oct 10, 2024 · 4 comments
Open

Apple M1 vs x86_64 floating point differences. #3123

nrnhines opened this issue Oct 10, 2024 · 4 comments

Comments

@nrnhines
Copy link
Member

I observe occasional slight differences between FPU results between Mac M1 and linux x86_64 . Those differences sometime accumulate and cause CI failure for MACOS. This issue is blocking progress on #1960

Here is an example of a gaussian elimination difference during triangularization during the first time step of a 1 compartment HH model.

The instrumentation in nrn/src/nrnoc/solve.cpp static void triang(NrnThread* _nt) { was :

printf("R fegetround %d\n", fegetround());
printf("Q triang\n");
    for (i = i3 - 1; i >= i2; --i) {
        auto const p = vec_a[i] / vec_d[i];
printf("Q i=%d a=%.17g d=%.17g p=%.17g\n", i, vec_a[i], vec_d[i], p);
        auto const pi = _nt->_v_parent_index[i];
        vec_d[pi] -= p * vec_b[i];
printf("Q   d[%d]=%.17g b[%d]=%.17g\n", pi, vec_d[pi], i, vec_b[i]);
        vec_rhs[pi] -= p * vec_rhs[i];
printf("Q   rhs[%d]=%.17g rhs[%d]=%.17g\n", pi, vec_rhs[pi], i, vec_rhs[i]);
    }

image

Double precision result identity perhaps requires an additional set of comparison files created by Apple M1 runs. Alternatively, suggestions by chatgpt that might be explored are:

Consistent compiler flags: Make sure you're using the same floating-point-related compiler flags on both platforms. Avoid flags like -ffast-math that can introduce non-IEEE-compliant optimizations.

Recompile with stricter floating-point options: Use -fp-model precise (Intel) or equivalent options to enforce stricter IEEE 754 compliance across different architectures.

Use deterministic math libraries: If available, use math libraries designed to ensure deterministic results across platforms.

@nrnhines
Copy link
Member Author

I asked chatgpt what the clang equivalent would be for -fp-model-precise and got explanations for each of

clang -ffp-contract=off -fno-fast-math -frounding-math -fsignaling-nans -fno-strict-aliasing ...

It turns out that the above difference in d[0] goes away with just -ffp-contract=off.

By the way, with ninja, it is particularly simple to add such a flag in build.ninja by searching for the cpp file and in the build rule adding to FLAGS.

@nrnhines
Copy link
Member Author

nrnhines commented Oct 10, 2024

With Mac cmake build

cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=install -DCMAKE_BUILD_TYPE=Debug -DNRN_ENABLE_RX3D=OFF -DNRN_ENABLE_TESTS=ON  -DCMAKE_C_FLAGS="-ffp-contract=off" -DCMAKE_CXX_FLAGS="-ffp-contract=off"

ctest -j 8 fails only for

         16 - pytest_coreneuron::basic_tests_py3.12 (Failed)
         94 - parallel::nrntest_fast (Failed)

Returning to the current master (uses cvode version 2) and instrumenting test_nrntest_fast.py to print the test name and the max difference between a current run and the standard test_nrntest_fast.json we get (with -ffp-contract=off on the m1) for python -m pytest -s test_nrntest_fast.py

image

The instrumentation changes are:

diff --git a/test/pytest_coreneuron/test_nrntest_fast.py b/test/pytest_coreneuron/test_nrntest_fast.py
index fabf61978..4256f2544 100644
--- a/test/pytest_coreneuron/test_nrntest_fast.py
+++ b/test/pytest_coreneuron/test_nrntest_fast.py
@@ -49,10 +49,13 @@ def chk():
     dir_path = os.path.dirname(os.path.realpath(__file__))
     fname = "test_nrntest_fast.json"
     if True:
-        if h.CVode().version().split(".")[0] == "3":
-            fname = "test_nrntest_fast_cv3.json"
-            global cvtol
-            cvtol = cv3tol
+        try:
+            if h.CVode().version().split(".")[0] == "3":
+                fname = "test_nrntest_fast_cv3.json"
+                global cvtol
+                cvtol = cv3tol
+        except:
+            pass
 
     print(fname)
     checker = Chk(os.path.join(dir_path, fname))
@@ -170,6 +173,7 @@ def compare_time_and_voltage_trajectories(
     method = model_data["method"]  # cvode or fixed
 
     # Determine which data we will use as a reference
+    title = "%s %d %s %s"%(name, threads, field, method)
     if threads == 1:
         # threads=1: compare to reference from JSON file on disk
         key = name + ":"
@@ -178,6 +182,7 @@ def compare_time_and_voltage_trajectories(
         else:
             key += method
         ref_data = chk.get(key, None)
+
         if ref_data is None:
             # No comparison to be done; store the data as a new reference
             chk(key, model_data[threads])
@@ -216,14 +221,19 @@ def compare_time_and_voltage_trajectories(
         these_vals = this_data[name][field]
         ref_vals = ref_data[name][field]
         assert len(these_vals) == len(ref_vals)
+        i=-1
         for a, b in zip(these_vals, ref_vals):
-            match = math.isclose(a, b, rel_tol=tolerance)
+            i += 1
+            match = math.isclose(a, b, rel_tol=0.0)#tolerance)
             if match:
                 continue
             diff = abs(a - b) / max(abs(a), abs(b))
+#            print(name, i, a, b, diff)
             max_diff = max(diff, max_diff)
+    print(title, "max_diff = ", max_diff, tolerance)
     if max_diff > tolerance:
-        raise Exception("max diff {} > {}".format(max_diff, tolerance))
+        # raise Exception("max diff {} > {}".format(max_diff, tolerance))
+        pass

Given this observation, it seems reasonable to focus on the fixed step difference beween x86_64 and M1.

@nrnhines
Copy link
Member Author

nrnhines commented Oct 14, 2024

Just a note. On mac m1, I get identical results between default and -ffp-model=precise. Also I get identical results between -ffp-model=strict and -ffp-contract=off. This is consistent with the table at https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior

image

Here is the difference between precise and contract off
image

@nrnhines
Copy link
Member Author

nrnhines commented Oct 15, 2024

In addition to contract=off, using branch hines/digest-debug-3 along with h.use_exp_pow_precision(1), python3 -m pytest -s test_nrntest_fast.py fixes the fixed step issues. i.e.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant