Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression Demo does not converge on Apple M1/ARM Mac #9

Open
schuderer opened this issue Jul 1, 2022 · 3 comments
Open

Regression Demo does not converge on Apple M1/ARM Mac #9

schuderer opened this issue Jul 1, 2022 · 3 comments

Comments

@schuderer
Copy link

schuderer commented Jul 1, 2022

Thank you for providing this library.

I have been using the separate RegressionTsetlinMachine with some success for some experiments but needed to reduce training time. After some tests with pyTsetlinMachine and tmu, decided to swap out the library with pyTsetlinMachine for a speed increase of several magnitudes. Oddly enough, my use case did not converge with pyTsetlinMachine's RegressionTsetlinMachine any more (using the same parameters as before: clauses=10000, T=10000, s=2.5).

There are no errors or warnings.

Digging deeper, I found that running RegressionDemo.py only reaches an RMSD of 1.21 (instead of the RMSD given in the readme). MAD is 1.0. As a sanity check, I also tried a synthetic example consisting of both binary feature vectors of 80 ones for the associated target value of 300, as well as all-zero features for the target of 0. The predictions look pretty random, and MAD is 150. This is with the example's unmodified hyperparameters (the demo uses 80 feature bits, too).

When a colleague of mine tried it on his VM on intel, both the Regression Demo as well as the sanity check converged as expected, same as when I tried it out on a Windows 10 laptop (thanks to the steps described in #7).

My system is the only Apple M1 system I have available for testing. If anyone is able to try it on a comparable system, this would help in narrowing it down. I feel that it might have something to do with the architecture (ARM vs Intel). My Python interpreter and C compiler both are arm64 native.

Macbook Pro M1 (arm64, 2021), macOS 12.4
Python 3.8.12 (arm64 native)
gcc: Apple clang version 13.1.6 arm64-apple-darwin21.5.0

Things I've tried out so far without any success:

  • Installing from the repo and a cloned local copy instead of PyPI
  • Trying the Windows 10 installation workaround (cannot install pyTsetlinMachine on windows10? #7) on my Mac as well
  • Removing the -O3 and -ffast-math optimization switches from the makefile's gcc calls
  • Building a wheel locally and installing
  • Pruning pip cache before reinstalling
  • Trying different sets of hyperparameters
@olegranmo
Copy link
Member

Hi @schuderer! Did you manage to solve the problem? Just upgraded to MacBook Pro M1 Max and tested the PyTsetlinMachine RegressionDemo. I am running:
Python 3.10.7 (main, Sep 14 2022, 22:38:23) [Clang 14.0.0 (clang-1400.0.29.102)]
gcc Apple clang version 14.0.0 (clang-1400.0.29.102).

It runs as expected on my side:
python3.10 ./examples/RegressionDemo.py

RMSD over 25 runs:

#1 RMSD: 0.61 +/- 0.00 (7.43s)
#2 RMSD: 0.61 +/- 0.00 (7.56s)
#3 RMSD: 0.60 +/- 0.00 (7.60s)
#4 RMSD: 0.61 +/- 0.00 (7.41s)
#5 RMSD: 0.61 +/- 0.01 (7.43s)
...

@schuderer
Copy link
Author

schuderer commented Oct 13, 2022

Hi @olegranmo, thank you very much for testing it out. I'm afraid that I don't remember whether I got it to work eventually. I've since moved on to your TMU project's regression implementation which is still faster than the original RegressionTsetlinMachine. Fortunately, I got additional resources to carry out a parameter search, so TMU was also acceptable. But it's good to hear that pyTsetlinMachine should work now, too! Still, I'm wondering why the newer TMU implementation appears to be slower than the pyTsetlinMachine.

@olegranmo
Copy link
Member

olegranmo commented Oct 13, 2022

Great it worked out! In the TMU implementation, a larger part of the code is moved over to Python/Numpy. Only the clauses are evaluated and updated in CUDA/C. Then it is easier to create and experiment with new kinds of architectures. However, for smaller number of features/clauses the Python overhead becomes a bottleneck. At some point, I plan to move functionality back into C/CUDA, or alternatively, use Numba to optimize the Python side. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants