Skip to content

Latest commit

 

History

History
68 lines (50 loc) · 7.21 KB

BZ-19329.md

File metadata and controls

68 lines (50 loc) · 7.21 KB

BZ-19329 Patch

Summary

This repository provides a method for working around the sporadic issue seen on older linux distributions: MathWorks® products can trigger an assert failure at concurrent pthread_create and dlopen (BZ-19329) in the GNU C Libraries (glibc).

If you are running

  • ubuntu-based systems and can upgrade to version 22.04 (Jammy Jellyfish) this is the safest and easiest way to alleviate the issue, since that version contains glibc v2.35 in which the underlying issue is completely fixed.
  • RHEL-based 8.4 or 8.5 systems (update 27 June 2022). It appears that RHEL have patched the glibc-2.28 packages in release 189 to fix this issue. Ensure that you have installed at least glibc-2.28-189.1.el8.

If instead you want to work around this issue, you can use this repository. It provides a build procedure (in an isolated Docker® container) to produce patched versions of the glibc libraries for recent Almalinux, Ubuntu® and Debian® releases. These patched versions incorporate an initial fix proposed on the libc-alpha mailing list that mitigate the issue. In the release area of this repository you can find the debian package build artefacts produced by running the build on Ubuntu 18.04 & 20.04 as well as Debian 9, 10 & 11. You can install these artefacts on an appropriate debian-based machine, virtual machine or docker container, using dpkg -i. For Almalinux you cand find the appropriate rpm's which should also work on UBI and CentOS containers.

Bug Description

The assert failure at concurrent pthread_create and dlopen glibc bug was first reported in December 2015 and can affect any process on Linux that creates a thread at the same time as opening a dynamic shared object library. Initially the issue was only observable with reasonable frequency on very large scale systems such as high performance computing clusters or cloud scale deployment platforms and so did not receive significant attention. However, early on there were proposed patches to the library. Large scale systems applied those patches in-house and saw significant benefit. More recently a proposed complete fix for this and a set of related issues has been reviewed by the glibc team and accepted into version 2.34 of glibc (released in August 2021). The 2.34 version of glibc is available in RHEL 9 beta and Ubuntu 21.10 (Impish Indri). However, there are no plans to backport the fix into previous glibc versions and it is expected that previous versions will be in production use for a significant number of years (e.g. the current end-of-life date for Ubuntu:20.04 is April 2030).

More recently MathWorks products have made extensive use of a C++ micro-services architecture. This architecture leads to a more dynamic system in which library modules are loaded at the point of use. As a result, the MATLAB® process is more likely to load a library at the same time as creating a thread, and so is more likely to encounter this glibc bug. When this issue is encountered the console that opened MATLAB shows a message similar to the following:

Inconsistency detected by ld.so: ../elf/dl-tls.c: 597: _dl_allocate_tls_init: Assertion 'listp != NULL' failed!

or

Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!

There might also be a stack trace file called matlab_crash_dump.${PID} in the users home folder or the current working folder. This usually indicates that a segmentation violation has been detected and the stack trace starts with something similar to the following:

Stack Trace (from fault):
[  0] 0x00002b661142d5a0    /lib64/ld-linux-x86-64.so.2+00075168 _dl_allocate_tls_init+00000080
[  1] 0x00002b66120c187c    /usr/lib64/libpthread.so.0+00034940 pthread_create+00001884

If you see these or similar signatures at a sufficient frequency on a system, you might want to consider patching glibc on that system, machine or container.

RHEL 8.4 & 8.5 Update (27 June 2022)

RHEL have just integrated the BZ-19329 patch into glibc-2.28-189.1.el8. It appear that the change actually went into build 2.28-175 and got released with 2.28-189.

Unless you need to use a pre-189 release of the package you should no longer need to use this repository to patch RHEL and AlmaLinux for BZ-18329

Patch sources

These patches all derive from an original patch put together by Szabolcs Nagy in January 2016. The 2.24 to 2.28 patches in this repo are derived from this original e-mail and can be downloaded directly from the archive of the [email protected] mailing list where they were proposed:

These 2 patches are directly linked in the original bug report in comment 7 by Pádraig Brady. In addition, the bug report also has a reference to the original Szabolcs Nagy patch in comment 4 (dated January 2016). The 2 messages above refer back to that original patch via a message describing the overall problem in more detail.

In addition, in Sept 2017 Pádraig Brady pointed out that there was an off-by-one error in the original patch that needs to be included

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 073321c..2c9ad2a 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -571,7 +571,7 @@ _dl_allocate_tls_init (void *result)
        }

       total += cnt;
-      if (total >= dtv_slots)
+      if (total > dtv_slots)
        break;

       /* Synchronize with dl_add_to_slotinfo.  */

This is source for the final unsubmitted-bz19329-fixup.v2.27.patch

In glibc v2.31 the original source code changed significantly and the patches needed to be slightly adapted so as to match the new codebase. These adapted patches are included here in the patches/2.31 folder and soft-linked from 2.32 and 2.33.

Acknowledgement and thanks

Many thanks to the broader glibc team and particularly Szabolcs Nagy for providing the original patches and for fixing these issues in glibc v2.34.