Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd_tesserocr processors waste CPU performance because of numpy blas threads #157

Open
stweil opened this issue Oct 3, 2020 · 6 comments

Comments

@stweil
Copy link
Contributor

stweil commented Oct 3, 2020

The current code imports numpy although it only uses a single function from that library. Including numpy creates a number of threads for the BLAS algorithms by default. Those threads use a lot of CPU time without doing anything useful.

Setting the environment variable OMP_THREAD_LIMIT=1 avoids those additional threads.

Maybe there exists a better solution which does not require an environment variable, for example removing the numpy requirement.

@bertsky
Copy link
Collaborator

bertsky commented Oct 4, 2020

The current code imports numpy although it only uses a single function from that library.

I can only see np.round in ocrd-tesserocr-segment-region, and only under very rare circumstances.

Including numpy creates a number of threads for the BLAS algorithms by default. Those threads use a lot of CPU time without doing anything useful.

Are you saying a function that does not even get called most of the time is consuming CPU time because of some multi-threaded library? How is that? Did you measure or bisect that?

Setting the environment variable OMP_THREAD_LIMIT=1 avoids those additional threads.

That's what workflow-configuration is doing whenever you run with multiple jobs.

@stweil
Copy link
Contributor Author

stweil commented Oct 4, 2020

@bertsky, it's not the function - it's the import statement which starts the threads which burn the CPU time.

@bertsky
Copy link
Collaborator

bertsky commented Oct 4, 2020

it's not the function - it's the import statement which starts the threads which burn the CPU time.

Did you cross-check that (deactivating the import statement and measuring again)?

(I have a hard time believing an unused module/function can burn CPU time.)

@stweil
Copy link
Contributor Author

stweil commented Oct 5, 2020

You are right. The function is used for some pages, but even after removing the import statement and the function call there remain 3 threads which use CPU time in my test. One is producing OCR. In GDB I see 6 threads (my CPU supports 6 threads), 5 of them looking like this:

(gdb) thr 7
[Switching to thread 7 (Thread 0x7fffe78c8700 (LWP 521057))]
#0  0x00007ffff7d54067 in sched_yield () at ../sysdeps/unix/syscall-template.S:120
120	../sysdeps/unix/syscall-template.S: Datei oder Verzeichnis nicht gefunden.
(gdb) i s
#0  0x00007ffff7d54067 in sched_yield () at ../sysdeps/unix/syscall-template.S:120
#1  0x00007fffefeda4f2 in blas_thread_server ()
   from /venv-20201001/lib/python3.7/site-packages/numpy/core/../../numpy.libs/libopenblasp-r0-34a18dc3.3.7.so
#2  0x00007ffff7f8bea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#3  0x00007ffff7d6ceaf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

So the problem remains, but my assumption what might be the reason was wrong.

@stweil
Copy link
Contributor Author

stweil commented Oct 5, 2020

I now checked thread creation in gdb. Even after removing the numpy code from segment_region.py there still remains a numpy which starts 5 blas_thread_server threads. But the process then creates lots of short living other threads, obviously triggered by shapely.

During execution I see 3 threads (always the same PIDs) using the CPU. By attaching gdb to one of them I could confirm that it is a blas_thread_server thread, so the subject of this issue is correct.

@bertsky
Copy link
Collaborator

bertsky commented Feb 12, 2021

@stweil this OpenBLAS issue looks related to what you describe. But it has been fixed 5yrs ago. So I guess it is already deployed in most systems we use today. (I just learned you need to install libatlas3-base liblapack3 libopenblas-base to make numpy use these backends. Not sure about our Docker images... And I don't understand numpy.__config__.show() yet.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants