Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] DEBUG only {2023.06,2023a} PyTorch-bundle v2.1.2 #603

Open
wants to merge 16 commits into
base: 2023.06-software.eessi.io
Choose a base branch
from

Conversation

trz42
Copy link
Collaborator

@trz42 trz42 commented Jun 12, 2024

The main purpose of this PR is to facilitate debugging various issues when building PyTorch-bundle and demonstrating approaches that could solve the issues. It is expected that the fixes provided here are not final.

  • includes a fix for find_library provided by ctypes.util which prevented importing soundfile
    • superseeded by fixing it in the Python installations
  • includes a fix for aarch64/{generic,neoverse_n1,neoverse_v1} where importing sentencepiece lead to the error libtcmalloc_minimal.so.4: cannot allocate memory in static TLS block
  • includes a fix for the extension torchvision where some library was not compiled with jpeg support, hence some tests failed $\rightarrow$

Initially we will disable all fixes, build for selected architectures and document the errors. We then enable fixes one-by-one and document the results (some error fixed, some new errors, ...).

Note, see the original PR for PyTorch-bundle (#585) for additional discussion about some of the issues listed above.

- PR to help debugging various issues when building PyTorch-bundle
- includes a fix for `find_library` provided by `ctypes.util` which prevented
  importing `sndfile`
- includes a fix for `aarch64/generic` where importing `sentencepiece` lead to
  the error `libtcmalloc_minimal.so.4: cannot allocate memory in static TLS block`
- includes a fix for the extension `torchvision` where some library was not
  compiled with `jpeg` support, hence some tests failed
@trz42 trz42 added help wanted Extra attention is needed 2023.06-software.eessi.io 2023.06 version of software.eessi.io labels Jun 12, 2024
Copy link

eessi-bot bot commented Jun 12, 2024

Instance eessi-bot-mc-aws is configured to build:

  • arch x86_64/generic for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/generic for repo eessi-hpc.org-2023.06-software
  • arch x86_64/generic for repo eessi.io-2023.06-compat
  • arch x86_64/generic for repo eessi.io-2023.06-software
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-compat
  • arch x86_64/intel/haswell for repo eessi.io-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-compat
  • arch x86_64/intel/skylake_avx512 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen2 for repo eessi.io-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen3 for repo eessi.io-2023.06-software
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/generic for repo eessi-hpc.org-2023.06-software
  • arch aarch64/generic for repo eessi.io-2023.06-compat
  • arch aarch64/generic for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_n1 for repo eessi.io-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi-hpc.org-2023.06-software
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-compat
  • arch aarch64/neoverse_v1 for repo eessi.io-2023.06-software

Copy link

eessi-bot bot commented Jun 12, 2024

Instance eessi-bot-mc-azure is configured to build:

  • arch x86_64/amd/zen4 for repo eessi-hpc.org-2023.06-compat
  • arch x86_64/amd/zen4 for repo eessi-hpc.org-2023.06-software
  • arch x86_64/amd/zen4 for repo eessi.io-2023.06-compat
  • arch x86_64/amd/zen4 for repo eessi.io-2023.06-software

@trz42
Copy link
Collaborator Author

trz42 commented Jun 12, 2024

Initially we'll build only for zen2 and aarch64/generic...

bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software
bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Jun 12, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

Copy link

eessi-bot bot commented Jun 12, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Jun 12, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12607

  • fails in the sanity check for librosa/0.10.1-foss-2023a when running python -c "import soundfile" with the log messages
== 2024-06-12 12:00:43,829 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: extensions sanity check failed for 1 extensions: soundfile
failing sanity check for 'soundfile' extension: command "python -c "import soundfile"" failed; output:
Traceback (most recent call last):
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 161, in <module>
    import _soundfile_data  # ImportError if this doesn't exist
    ^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named '_soundfile_data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 171, in <module>
    _snd = _ffi.dlopen(_libname)
           ^^^^^^^^^^^^^^^^^^^^^
OSError: cannot load library 'libsndfile.so.1': libsndfile.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 192, in <module>
    _snd = _ffi.dlopen(_explicit_libname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or directory,  (at easybuild/framework/easyblock.py:3669 in _sanity_check_step)
  • to work around this error we need a custom ctypes
date job status comment
Jun 12 11:27:18 UTC 2024 submitted job id 12607 awaits release by job manager
Jun 12 11:28:21 UTC 2024 released job awaits launch by Slurm scheduler
Jun 12 11:35:26 UTC 2024 running job 12607 is running
Jun 12 12:08:26 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-12607.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1718193717.tar.gzsize: 162 MiB (170635688 bytes)
entries: 6322
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
imageio/2.33.1-gfbf-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
imageio/2.33.1-gfbf-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/init/easybuild/eb_hooks.py
Jun 12 12:08:26 UTC 2024 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 12/12 test case(s) from 12 check(s) (2 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-12607.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Jun 12, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12608

  • fails in the sanity check for librosa/0.10.1-foss-2023a when running python -c "import soundfile" with the log messages
== 2024-06-12 11:55:32,669 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: extensions sanity check failed for 1 extensions: soundfile
failing sanity check for 'soundfile' extension: command "python -c "import soundfile"" failed; output:
Traceback (most recent call last):
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 161, in <module>
    import _soundfile_data  # ImportError if this doesn't exist
    ^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named '_soundfile_data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 171, in <module>
    _snd = _ffi.dlopen(_libname)
           ^^^^^^^^^^^^^^^^^^^^^
OSError: cannot load library 'libsndfile.so.1': libsndfile.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/librosa/0.10.1-foss-2023a/lib/python3.11/site-packages/soundfile.py", line 192, in <module>
    _snd = _ffi.dlopen(_explicit_libname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or directory,  (at easybuild/framework/easyblock.py:3669 in _sanity_check_step)
  • to work around this error we need a custom ctypes
date job status comment
Jun 12 11:27:22 UTC 2024 submitted job id 12608 awaits release by job manager
Jun 12 11:28:19 UTC 2024 released job awaits launch by Slurm scheduler
Jun 12 11:34:23 UTC 2024 running job 12608 is running
Jun 12 12:04:20 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-12608.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1718193401.tar.gzsize: 152 MiB (160274969 bytes)
entries: 6322
modules under 2023.06/software/linux/aarch64/generic/modules/all
imageio/2.33.1-gfbf-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
imageio/2.33.1-gfbf-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/generic
2023.06/init/easybuild/eb_hooks.py
Jun 12 12:04:20 UTC 2024 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 12/12 test case(s) from 12 check(s) (2 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-12608.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@EESSI EESSI deleted a comment from eessi-bot bot Jun 12, 2024
@EESSI EESSI deleted a comment from eessi-bot bot Jun 12, 2024
@EESSI EESSI deleted a comment from eessi-bot bot Jun 12, 2024
@EESSI EESSI deleted a comment from eessi-bot bot Jun 12, 2024
@trz42
Copy link
Collaborator Author

trz42 commented Jun 15, 2024

The two jobs (12607 and 12608) that did not include any fixes failed both in the sanity check for librosa. After enabling the fixes for that by

  • installing a custom ctypes library;
  • adding a parse_hook to use the custom ctypes library in the sanity check; and
  • adding a pre_module_hook that adds a setting to use this custom ctypes library when the module for librosa is loaded;

we repeat the building for the same architectures zen2 and aarch64/generic...

bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software
bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Jun 15, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

Copy link

eessi-bot bot commented Jun 15, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Jun 15, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12808

  • failed with errors when testing the extension torchvision of PyTorch-bundle...
=================================== FAILURES ===================================
___ test_decode_jpeg[None-ImageReadMode.UNCHANGED-grace_hopper_517x606.jpg] ____
test/test_image.py:94: in test_decode_jpeg
    img_ljpeg = decode_image(data, mode=mode)
/tmp/eb-7t6okia0/eb-js7oqjgv/tmpjpww4km2/lib/python3.11/site-packages/torchvision/io/image.py:236: in decode_image
    output = torch.ops.image.decode_image(input, mode.value)
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/PyTorch/2.1.2-foss-2023a/lib/python3.11/site-packages/torch/_ops.py:692: in __call__
    return self._op(*args, **kwargs or {})
E   RuntimeError: decode_jpeg: torchvision not compiled with libjpeg support
  • inspecting the job's individual build step logs (via bot/inspect.sh --resume previous_tmp/build_step/eessi.io-2023.06-software-1718457554.tgz run in the job's working directory /project/def-users/SHARED/jobs/2024.06/pr_603/12808 on the same type of node // e.g., via an interactive job submitted with srun --partition x86-64-amd-zen2-node --time=60 --pty bash), we find the following messages in /tmp/eb-7t6okia0/eb-js7oqjgv/easybuild-run_cmd-9b5lqisq.log (log file for building the extension torchvision)
  Compiling extensions with following flags:
    FORCE_CUDA: False
    FORCE_MPS: False
    DEBUG: False
    TORCHVISION_USE_PNG: True
    TORCHVISION_USE_JPEG: True
    TORCHVISION_USE_NVJPEG: True
    TORCHVISION_USE_FFMPEG: True
    TORCHVISION_USE_VIDEO_CODEC: True
    NVCC_FLAGS:
  Compiling with debug mode OFF
  Found PNG library
  Building torchvision with PNG image support
    libpng version: 1.6.39
    libpng include path: /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/libpng/1.6.39-GCCcore-12.3.0/include/libpng16
  Running build on conda-build: False
  Running build on conda: False
  Building torchvision without JPEG image support
  Building torchvision without NVJPEG image support
  • it looks like it doesn't find the jpeg library and hence builds without JPEG support
  • consequently, it later fails in the test step
  • the setup.py in /tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchvision/vision-0.16.2 that produces the above messages showing that torchvision is compiled without JPEG support includes a function find_library with the following code
    def find_library(name, vision_include):
        this_dir = os.path.dirname(os.path.abspath(__file__))
        build_prefix = os.environ.get("BUILD_PREFIX", None)
        is_conda_build = build_prefix is not None
    
        library_found = False
        conda_installed = False
        lib_folder = None
        include_folder = None
        library_header = f"{name}.h"
    
        # Lookup in TORCHVISION_INCLUDE or in the package file
        package_path = [os.path.join(this_dir, "torchvision")]
        for folder in vision_include + package_path:
            candidate_path = os.path.join(folder, library_header)
            library_found = os.path.exists(candidate_path)
            if library_found:
                break
  • running the build script (setup.py) manually in an "inspect" session revealed that the second parameter to find_library was an empty list []
date job status comment
Jun 15 12:04:28 UTC 2024 submitted job id 12808 awaits release by job manager
Jun 15 12:04:32 UTC 2024 released job awaits launch by Slurm scheduler
Jun 15 12:10:36 UTC 2024 running job 12808 is running
Jun 15 13:47:58 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-12808.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1718457726.tar.gzsize: 282 MiB (296485955 bytes)
entries: 9314
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
custom_ctypes/1.2.lua
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
librosa/0.10.1-foss-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
custom_ctypes/1.2
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
librosa/0.10.1-foss-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/init/easybuild/eb_hooks.py
Jun 15 13:47:58 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-12808.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Jun 15, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/12809

  • failed in the sanity check for SentencePiece/0.2.0-GCC-12.3.0 with the following log messages
== 2024-06-15 12:40:44,834 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): Sanity check failed: sanity check command python -c 'import sentencepiece' exited with code 1 (output: Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/SentencePiece/0.2.0-GCC-12.3.0/lib/python3.11/site-packages/sentencepiece/__init__.py", line 10, in <module>
    from . import _sentencepiece
ImportError: /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_minimal.so.4: cannot allocate memory in static TLS block
) (at easybuild/framework/easyblock.py:3669 in _sanity_check_step)
date job status comment
Jun 15 12:04:32 UTC 2024 submitted job id 12809 awaits release by job manager
Jun 15 12:05:34 UTC 2024 released job awaits launch by Slurm scheduler
Jun 15 12:11:38 UTC 2024 running job 12809 is running
Jun 15 13:04:14 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-12809.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1718455310.tar.gzsize: 258 MiB (270844195 bytes)
entries: 9169
modules under 2023.06/software/linux/aarch64/generic/modules/all
custom_ctypes/1.2.lua
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
librosa/0.10.1-foss-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
custom_ctypes/1.2
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
librosa/0.10.1-foss-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/generic
2023.06/init/easybuild/eb_hooks.py
Jun 15 13:04:14 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 12/12 test case(s) from 12 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-12809.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@EESSI EESSI deleted a comment from eessi-bot bot Jun 15, 2024
@EESSI EESSI deleted a comment from eessi-bot bot Jun 15, 2024
@EESSI EESSI deleted a comment from eessi-bot bot Jun 15, 2024
@EESSI EESSI deleted a comment from eessi-bot bot Jun 15, 2024
@EESSI EESSI deleted a comment from eessi-bot bot Jun 15, 2024
@EESSI EESSI deleted a comment from eessi-bot bot Jun 15, 2024
@EESSI EESSI deleted a comment from eessi-bot bot Jun 15, 2024
@EESSI EESSI deleted a comment from eessi-bot bot Jun 15, 2024
@EESSI EESSI deleted a comment from eessi-bot bot Jun 15, 2024
@trz42
Copy link
Collaborator Author

trz42 commented Jun 29, 2024

Rebuilding for zen2 to verify if a new easyblock for torchvision fixes the issue that libjpeg couldn't be find...

bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Jun 29, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

Copy link

eessi-bot bot commented Jun 29, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Jun 29, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.06/pr_603/13549

  • the installation of PyTorch-bundle succeeded, so the updated easyblock for torchvision works! 🎉
  • however, the build failed when checking for missing installations with
1 out of 138 required modules missing:

* grpcio/1.57.0-GCCcore-12.3.0 (grpcio-1.57.0-GCCcore-12.3.0.eb)
date job status comment
Jun 29 20:55:20 UTC 2024 submitted job id 13549 awaits release by job manager
Jun 29 20:55:26 UTC 2024 released job awaits launch by Slurm scheduler
Jun 29 21:00:28 UTC 2024 running job 13549 is running
Jun 29 23:04:35 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13549.out
❌ found message matching ERROR:
✅ no message matching FAILED:
❌ found message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1719701425.tar.gzsize: 293 MiB (307397497 bytes)
entries: 10800
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
custom_ctypes/1.2.lua
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
librosa/0.10.1-foss-2023a.lua
LLVM/14.0.6-GCCcore-12.3.0-llvmlite.lua
NLTK/3.8.1-foss-2023a.lua
numba/0.58.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
PyTorch-bundle/2.1.2-foss-2023a.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
custom_ctypes/1.2
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
librosa/0.10.1-foss-2023a
LLVM/14.0.6-GCCcore-12.3.0-llvmlite
NLTK/3.8.1-foss-2023a
numba/0.58.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
PyTorch-bundle/2.1.2-foss-2023a
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/init/easybuild/eb_hooks.py
Jun 29 23:04:35 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 14/14 test case(s) from 14 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-13549.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@boegel
Copy link
Contributor

boegel commented Jul 5, 2024

Rebuilding for zen2 to verify if a new easyblock for torchvision fixes the issue that libjpeg couldn't be find...

Maybe related to:

…-layer into debug-2023.06-software.eessi.io-PyTorch-2.1.2-foss-2023a
- PR EESSI#655 implements a general fix for the import error
@trz42
Copy link
Collaborator Author

trz42 commented Aug 1, 2024

Rebuilding after #655 got merged to verify if the import soundfile in librosa's sanity check succeeds...

bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software
bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 1, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

Copy link

eessi-bot bot commented Aug 1, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 1, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_603/15500

  • installation of PyTorch-bundle succeeded, but then the check for missing installations failed with
1 out of 138 required modules missing:

* grpcio/1.57.0-GCCcore-12.3.0 (grpcio-1.57.0-GCCcore-12.3.0.eb)
  • librosa has already been ingested (hence sanity check wasn't run at all)
date job status comment
Aug 01 07:12:23 UTC 2024 submitted job id 15500 awaits release by job manager
Aug 01 07:12:54 UTC 2024 released job awaits launch by Slurm scheduler
Aug 01 07:18:58 UTC 2024 running job 15500 is running
Aug 01 09:08:30 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-15500.out
❌ found message matching ERROR:
✅ no message matching FAILED:
❌ found message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1722502307.tar.gzsize: 154 MiB (162347107 bytes)
entries: 6301
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
NLTK/3.8.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
PyTorch-bundle/2.1.2-foss-2023a.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
NLTK/3.8.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
PyTorch-bundle/2.1.2-foss-2023a
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/init/easybuild/eb_hooks.py
Aug 01 09:08:30 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 16/16 test case(s) from 16 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-15500.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Copy link

eessi-bot bot commented Aug 1, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_603/15501

  • fails with the known Segmentation fault
== 2024-08-01 07:37:12,561 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/tools/build_log.py:111 in caller_info): cmd "export PYTHONPATH=/tmp/eb-en1c7x64/eb-intmrk91/tmphzi6yecp/lib/python3.11/site-packages:$PYTHONPATH
&&  pytest test/torchtext_unittest -k "not test_vocab_from_raw_text_file"" and not test_get_tokenizer_moses"" and not test_get_tokenizer_spacy"" and not test_download_charngram_vectors" " exited with exit code -11 and output:
Fatal Python error: Segmentation fault

Current thread 0x000040003ebf5a00 (most recent call first):
  File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1268 in TestMaskTransform
  File "/tmp/bot/easybuild/build/PyTorchbundle/2.1.2/foss-2023a/torchtext/text-0.16.2/test/torchtext_unittest/test_transforms.py", line 1255 in <module>
  File "/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/assertion/rewrite.py", line 178 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1178 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1206 in _gcd_import
  • librosa has already been ingested (hence sanity check wasn't run at all)
date job status comment
Aug 01 07:12:27 UTC 2024 submitted job id 15501 awaits release by job manager
Aug 01 07:12:52 UTC 2024 released job awaits launch by Slurm scheduler
Aug 01 07:18:56 UTC 2024 running job 15501 is running
Aug 01 08:14:17 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-15501.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1722497905.tar.gzsize: 142 MiB (149117531 bytes)
entries: 4815
modules under 2023.06/software/linux/aarch64/generic/modules/all
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
NLTK/3.8.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
NLTK/3.8.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/generic
2023.06/init/easybuild/eb_hooks.py
Aug 01 08:14:17 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 16/16 test case(s) from 16 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-15501.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Collaborator Author

trz42 commented Aug 8, 2024

Rebuilding after changes have been minimised (only hook for SentencePiece kept for now) and #660 has been ingested...

bot: build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 8, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

Copy link

eessi-bot bot commented Aug 8, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:x86_64/amd/zen2 repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software
  • handling command build architecture:x86_64/amd/zen2 repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 8, 2024

New job on instance eessi-bot-mc-aws for architecture x86_64-amd-zen2 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_603/15895

date job status comment
Aug 08 10:29:43 UTC 2024 submitted job id 15895 awaits release by job manager
Aug 08 10:30:06 UTC 2024 released job awaits launch by Slurm scheduler
Aug 08 10:36:09 UTC 2024 running job 15895 is running
Aug 08 12:26:42 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-15895.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1723118912.tar.gzsize: 154 MiB (162356101 bytes)
entries: 6302
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
NLTK/3.8.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
PyTorch-bundle/2.1.2-foss-2023a.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/x86_64/amd/zen2/software
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
NLTK/3.8.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
PyTorch-bundle/2.1.2-foss-2023a
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/init/easybuild/eb_hooks.py
2023.06/init/eessi_archdetect.sh
Aug 08 12:26:42 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 17/17 test case(s) from 17 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-15895.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

…-layer into debug-2023.06-software.eessi.io-PyTorch-2.1.2-foss-2023a
@trz42
Copy link
Collaborator Author

trz42 commented Sep 3, 2024

Revisit switching off TCMALLOC...

bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Sep 3, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

Copy link

eessi-bot bot commented Sep 3, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Sep 3, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_603/17634

date job status comment
Sep 03 12:33:52 UTC 2024 submitted job id 17634 awaits release by job manager
Sep 03 12:34:22 UTC 2024 released job awaits launch by Slurm scheduler
Sep 03 12:40:25 UTC 2024 running job 17634 is running
Sep 03 13:53:08 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17634.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1725368522.tar.gzsize: 142 MiB (149130803 bytes)
entries: 4815
modules under 2023.06/software/linux/aarch64/generic/modules/all
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
NLTK/3.8.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
NLTK/3.8.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/generic
2023.06/init/easybuild/eb_hooks.py
Sep 03 13:53:08 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17634.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Collaborator Author

trz42 commented Sep 3, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants