Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize BSS use of FFT with cupy, speed up of up to 3x for full tracks #83

Open
sevagh opened this issue Apr 22, 2021 · 6 comments
Open

Comments

@sevagh
Copy link

sevagh commented Apr 22, 2021

Hello,
I have been working on some potential performance optimizations for the BSS evaluation (which is rather slow/compute intensive for full tracks).

Baseline measurement with original museval code (the total execution involves also computing the IRM, adapted from https://github.com/sigsep/sigsep-mus-oracle/blob/master/IRM.py):

museval bss original execution time, 1 track of musdb
pybin: /home/sevagh/venvs/museval-orig/bin/python3
evaluating track AM Contra - Heart Peripheral

real    3m22.702s
user    3m21.577s
sys     0m39.376s

The original code takes ~3:20 minutes.

The second optimization uses cupy and the GPU, which is in my opinion a big cost/burden for end users. Installing the CUDA toolkit etc. is no joke. Here is the code: master...sevagh:feat/cupy-accel
However, the performance is rather good at ~1:20 minutes, so maybe almost ~3x faster than the original code:

museval bss optimization 2 (cupy on gpu) execution time, 1 track of musdb
pybin: /home/sevagh/venvs/museval-optimization-2/bin/python3
evaluating track AM Contra - Heart Peripheral

real    1m19.801s
user    1m27.077s
sys     0m29.615s

One final note is that the CUDA/cupy version has slight differences in the outputs due to numerical precision differences. It doesn't look too significant to me - here's an excerpt of a diff between the evaluated json files, showing small differences in the BSS scores:

@@ -10459,8 +10459,8 @@
-            "SAR": 30.60528,
-            "ISR": 30.67039
+            "SAR": 30.60525,
+            "ISR": 30.67036
@@ -10469,8 +10469,8 @@
-            "SAR": 30.45440,
-            "ISR": 30.52629
+            "SAR": 30.45438,
+            "ISR": 30.52627
@@ -10480,7 +10480,7 @@
-            "ISR": 20.99668
+            "ISR": 20.99667

I'm also trying to find a way to use CPU parallelism with scipy.fft and combining several of the FFTs in a single call, but this isn't really helping as much as the CUDA change. My code attempts can be seen here: master...sevagh:multiple-1d-fft

I'm aware of the separate repo for bss at https://github.com/sigsep/bsseval/ but I wasn't sure which project to discuss it in - I'm using museval because I'm trying to recreate the SiSec 2018 testbench.

@sevagh
Copy link
Author

sevagh commented Apr 22, 2021

Also there could be a "super-performant" config with cupy, stacking multiple 1D FFTs (respecting GPU memory allocation limits), and using pinned host/gpu memory and FFT plans - I'll continue working in that direction.

@sevagh
Copy link
Author

sevagh commented Apr 24, 2021

Optimized every slow line (discovered through kernprof + line_profiler): master...sevagh:feat/cupy-accel

This leads to just about 1 minute to compute the IRM mask and perform a BSS evaluation on 1 full-length MUSDB18 track:

real    1m1.762s
user    0m50.948s
sys     0m13.620s

This is down from the 3+ minutes originally:

real    3m22.702s
user    3m21.577s
sys     0m39.376s

@faroit
Copy link
Member

faroit commented Aug 6, 2021

@sevagh i think this would be great. Do the regression tests pass using this?

@sevagh
Copy link
Author

sevagh commented Aug 6, 2021

How can I run the tests? python setup.py test?

@faroit
Copy link
Member

faroit commented Aug 10, 2021

install the test evironment pip install .[tests] and then run

py.test tests/test_regression.py -vs

@sevagh
Copy link
Author

sevagh commented Aug 10, 2021

OK. My most recent commits get the regression tests passing. Casting explicitly to float32 was creating huge errors in SAR/SIR/ISR, so I just removed them.

I made the cupy install optional (although fixed to CUDA 11.4, which is rather recent).

Other notes/idiosyncrasies is that it's best to clear the cupy FFT cache between BSS evaluations of large songs. That's why I added this helper function:
master...sevagh:feat/cupy-accel#diff-cc17d32a9d811e616624c2f2699f853dd06b143931ea9e37a6cc0dab6a4b8ab9R75-R88

In real code you would do:

for track in mus.tracks:
    ...
    scores = museval.eval_mus_track(...) # cupy under the hood
    museval.clear_cupy_cache()

Passing regression test:

(museval-cupy) sevagh:sigsep-mus-eval $ py.test tests/test_regression.py -vs
===================================================== test session starts =====================================================
platform linux -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /home/sevagh/venvs/museval-cupy/bin/python
cachedir: .pytest_cache
rootdir: /home/sevagh/repos/sigsep-mus-eval, configfile: setup.cfg
collected 4 items

tests/test_regression.py::test_aggregate[Music Delta - 80s Rock]     time         target metric     score                   track
[...]
Aggrated Scores (median over frames, median over tracks)
vocals          ==> SDR: -15.622  SIR:   9.165  ISR:  -8.476  SAR:  -7.327
accompaniment   ==> SDR: -13.290  SIR: -18.765  ISR:  -0.322  SAR:  -7.427

PASSED
tests/test_regression.py::test_track_scores[Music Delta - 80s Rock] PASSED
tests/test_regression.py::test_random_estimate[Music Delta - 80s Rock] PASSED
tests/test_regression.py::test_one_estimate[Music Delta - 80s Rock] PASSED

====================================================== warnings summary =======================================================
../../venvs/museval-cupy/lib/python3.9/site-packages/past/builtins/misc.py:45
  /home/sevagh/venvs/museval-cupy/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    from imp import reload

tests/test_regression.py: 12 warnings
  /home/sevagh/repos/sigsep-mus-eval/museval/metrics.py:601: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    eps = np.finfo(np.float).eps

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=============================================== 4 passed, 13 warnings in 46.33s ===============================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants