feat: Fedora packaging #759

LecrisUT · 2024-01-30T17:19:47Z

This adds packaging CI/CD for Fedora:

When a new release is pushed, a pull-request is created at https://src.fedoraproject.org/rpms/dbcsr
For each PR/commit to develop, the packaging is tested in copr
(not yet implemented) Tests for packaging imports, e.g. via find_package. Needs some refactoring similar to spglib

To fully work the packit app needs to be installed on the repo. In the meantime the example run can be seen on my fork: LecrisUT#1

soft-depends-on: #757

jenkins-cscs · 2024-01-30T17:22:49Z

Can one of the admins verify this patch?

LecrisUT · 2024-01-30T17:27:34Z

Note that this CI/CD already highlights some issues: the current test builder is significantly slower on open-mpi vs mpich, and both are slower than serial. Even with #757 we see such problems

dev-zero · 2024-02-20T08:56:48Z

@LecrisUT thanks a lot for the work, I like it.
Quick question: what is the the cmake build type? I believe we configure only Release, Debug and Coverage (custom one) in cmake/CompilerConfiguration.cmake but neglected RelWithDebInfo so far, which may be popular with distros.

dev-zero · 2024-02-20T08:59:09Z

Also, one of the reasons to carry our own FindBLAS.cmake (FindLAPACK.cmake is only there since it pulls in FindBLAS.cmake) is the correct detection and selection of OpenMP-enabled OpenBLAS on some systems.
Do you know how that transfers over to flexiblas?

LecrisUT · 2024-02-20T22:58:24Z

Quick question: what is the the cmake build type? I believe we configure only Release, Debug and Coverage (custom one) in cmake/CompilerConfiguration.cmake

Good question, it's RelWithDebInfo. But regarding that design, please do not do it like that. Most of these options are meant for the user (in this case distro package) to control. Use target_compile_options instead with generator expression. But in most cases have faith that cmake defines it properly or if() it around a cmake option.

Also, one of the reasons to carry our own FindBLAS.cmake (FindLAPACK.cmake is only there since it pulls in FindBLAS.cmake) is the correct detection and selection of OpenMP-enabled OpenBLAS on some systems.

That one I did not research. Can you elaborate on it? I will port it upstream and propose a more portable approach if it's needed.

Do you know how that transfers over to flexiblas?

That's straightforward, just have the user (or in a preset) define the blas vendor. On the fedora script I don't do it because it's the only vendor available. Not sure about the nuance with openmp enabled though.

dev-zero · 2024-02-21T09:56:58Z

Quick question: what is the the cmake build type? I believe we configure only Release, Debug and Coverage (custom one) in cmake/CompilerConfiguration.cmake

Good question, it's RelWithDebInfo. But regarding that design, please do not do it like that. Most of these options are meant for the user (in this case distro package) to control. Use target_compile_options instead with generator expression. But in most cases have faith that cmake defines it properly or if() it around a cmake option.

Well, at the time when we started, this was the best/easiest way to get consistent flag configuration across multiple targets.
Do you have a suggestion on how to do this without repeating it many times? add_compile_options on the top-level dir?

Also, one of the reasons to carry our own FindBLAS.cmake (FindLAPACK.cmake is only there since it pulls in FindBLAS.cmake) is the correct detection and selection of OpenMP-enabled OpenBLAS on some systems.

That one I did not research. Can you elaborate on it? I will port it upstream and propose a more portable approach if it's needed.

Sure, the best documentation is the history of our FindBLAS.cmake:
https://github.com/cp2k/dbcsr/commits/develop/cmake/FindBLAS.cmake

In particular this is the diff against FindBLAS.cmake:
f9dca07

The point is that CP2K (and DBCSR by extension) calls BLAS/LAPACK routines from OpenMP parallel regions (and also from non-OpenMP parallel regions). Hence we need OpenBLAS to be built with OpenMP. FindBLAS.cmake at the point of import did automatically use the pthreaded OpenBLAS, but not the OpenMP one.
What the patch does is extending the list of OpenBLAS names to look for by openblas_openmp, and checking the detected openblas library for proper threading support.

To submit it to CMake I thought about adding a component to explicitly request openmp or pthread components.
On the other hand, one might want to make it more general since MKL for example has only sequential and threaded (same for cray-libsci).
And there is a distinction to made between threaded and thread-safe.

Do you know how that transfers over to flexiblas?

That's straightforward, just have the user (or in a preset) define the blas vendor. On the fedora script I don't do it because it's the only vendor available. Not sure about the nuance with openmp enabled though.

This then requires that we have a runtime check on the flexiblas provider to ensure people are not passing along a non-thread-compatible BLAS/LAPACK implementation. Essentially what we have in the FindBLAS.cmake patch, just for flexiblas.

dev-zero · 2024-02-21T09:58:53Z

fyi, not approving the workflows since the changes in this PR would not be covered by the tests (yet)

LecrisUT · 2024-02-21T13:53:58Z

BTW, this is a Fedora packaging PR, we should discuss it on a CMake modernization one/issue. If you can open an issue can copy the discussion, we can continue there

Do you have a suggestion on how to do this without repeating it many times? add_compile_options on the top-level dir?

It depends on where it is used. You can define a DBCSR_Base target.

The point is that CP2K (and DBCSR by extension) calls BLAS/LAPACK routines from OpenMP parallel regions (and also from non-OpenMP parallel regions). Hence we need OpenBLAS to be built with OpenMP. FindBLAS.cmake at the point of import did automatically use the pthreaded OpenBLAS, but not the OpenMP one.
What the patch does is extending the list of OpenBLAS names to look for by openblas_openmp, and checking the detected openblas library for proper threading support.

To submit it to CMake I thought about adding a component to explicitly request openmp or pthread components.
On the other hand, one might want to make it more general since MKL for example has only sequential and threaded (same for cray-libsci).
And there is a distinction to made between threaded and thread-safe.

Interesting, I will need some time to digest this, probably after the CECAM conference.

This then requires that we have a runtime check on the flexiblas provider to ensure people are not passing along a non-thread-compatible BLAS/LAPACK implementation. Essentially what we have in the FindBLAS.cmake patch, just for flexiblas.

Flexiblas is runtime selecting the BLAS/LAPACK. Maybe we can check if we can impose static restrictions there

hfp · 2024-05-16T11:49:06Z

If this PR targets the next release of CP2K/DBCSR, it would be the right time to finish/conclude it.

LecrisUT · 2024-05-16T11:53:20Z

Well one design question, do we want to implement the same structure as cp2k/cp2k repo? In either case this PR is finished and ready for review as-is. The only tasks to be done are on the project owner side to enable packit.

alazzaro · 2024-05-16T12:00:17Z

Sorry for the slow response, I will see if I have time to review it in June... In any case, DBCSR is tested in CP2K, where the machinery is already in place, so I don't think there is any rush...

oschuett · 2024-05-16T12:03:53Z

The only tasks to be done are on the project owner side to enable packit.

I've already enabled the Packit-as-a-Service app for this repo.

LecrisUT · 2024-05-16T12:07:13Z

The testing here with packit and the CI in CP2K are completely different in subtle ways. Packit testing in cp2k never picks up the changes here, and the changes here never check that packaging and changes in Fedora rawhide. The former are accounted for in the cp2k testing, but the latter should be addressed with this approach as I've mentioned some of the subtleties in the cp2k PR for this.

I would say the only issue here is, since cp2k has adopted a different integration approach here, do we wish to mirror that integration here? @oschuett opinion?

LecrisUT · 2024-05-16T12:11:25Z

/packit build

alazzaro · 2024-05-16T12:14:35Z

@LecrisUT sorry, I was not clear in my message... As I said, I will take care of your changes. We do not need to follow any CP2K repo rule here (DBCSR is a much smaller project, at some point it will become standalone in CP2K).
What I'm saying here is that there are no plans to make a new release of DBCSR beyond the current RC in CP2K, which we are already testing it.
Then, of course, I agree with you that we need to merge this PR... Please, consider DBCSR as an independent project.

LecrisUT · 2024-05-16T12:25:22Z

Please, consider DBCSR as an independent project.

Of course, I was not assuming any dependencies, and I will be leaving that design issue to upstream. Packit does not have the tools yet to do reverse dependency testing either.

As for the other comment, the design question still stands and I am in favor of consistency.

I would say the only issue here is, since cp2k has adopted a different integration approach here, do we wish to mirror that integration here? @oschuett opinion?

One softer design question there would be if the CI could be run more freely in this project's case

alazzaro · 2024-05-16T12:39:24Z

One softer design question there would be if the CI could be run more freely in this project's case

I think so, CI here is github with very low new additions...
And yes, we can reuse what you did in CP2K. Thanks for your contributions. I will be back to you in June. I see the checks are running and some are already green :)

hfp · 2024-05-16T12:39:56Z

This is unrelated, sorry...

"there are no plans to make a new release of DBCSR beyond the current RC in CP2K"

It would be nice to pick-up DBCSR's changes since the last RC. I changed some common bits in OpenCL which are necessary for CP2K/DBM status. In any case, I think the OpenCL BE might become separate as well (not the DBCSR kernel/s however). For the latter, I would perhaps change the license to BSD-3.

alazzaro · 2024-05-16T12:52:15Z

This is unrelated, sorry...

"there are no plans to make a new release of DBCSR beyond the current RC in CP2K"

It would be nice to pick-up DBCSR's changes since the last RC. I changed some common bits in OpenCL which are necessary for CP2K/DBM status. In any case, I think the OpenCL BE might become separate as well (not the DBCSR kernel/s however). For the latter, I would perhaps change the license to BSD-3.

@hfp We had a long discussion and preparation time ago on making DBCSR standalone. There is a ticket for that in CP2K, back to 2019, and the discussion was repeated in several other places during the cmake activity. Now I see @oschuett is moving CP2K to cmake, so I think it is good time for CP2K to use DBCSR as a standalone library. Then, it will be much easier to provide new DBCSR as needed.

alazzaro · 2024-06-18T20:29:00Z

/packit build

LecrisUT · 2024-06-19T05:17:26Z

The tests taking 2136.78 on openmpi vs 209.08 on mpich or 80.11 on serial is fairly suspicious. The tests are oversubscribed, but still the discrepancy is too high. I'll disable the tests for now, and come back with a PR to make it possible to run the testsuite externally

alazzaro · 2024-06-19T06:19:49Z

I'm checking this output: https://download.copr.fedorainfracloud.org/results/lecris/dbcsr/fedora-rawhide-x86_64/07625725-dbcsr/builder-live.log.gz

I see that the test which takes most of the time is:

Serial case

      Start 11: dbcsr_unittest1
11/17 Test #11: dbcsr_unittest1 .......................................   Passed   32.63 sec

MPICH case

      Start 11: dbcsr_unittest1
11/19 Test #11: dbcsr_unittest1 .......................................***Timeout 1500.00 sec

I can try to split this test in multiples, but clearly it doesn't solve the problem...

In our CI we do run with the following options:

dbcsr/.github/workflows/testing-linux.yml

Line 58 in d332b30

    
                     -DMPIEXEC_PREFLAGS="$([ "${{ matrix.mpi_suffix }}" = "openmpi" ] && echo "-mca btl ^openib --allow-run-as-root --oversubscribe")" \

According to the OpenMPI documentation, the --oversubscribe does bind to none. Can we try this option?

hfp · 2024-06-19T07:40:00Z

Making the workload run with OpenMPI using --oversubscribe is wrong. It usually means the launcher does not know how many processing elements (PE) in OpenMPI slang are available on the target system. In the past (4.x series of oMPI), I used, e.g., --map-by slot:PE=42 with 42 the number of cores (even with SMT=on), and then -npernode plus total rank-count specified with mpirun. For the 5-series, I am using something like --map-by ppr:$(((NRANKS+NS-1)/NS)):package:PE=$((NC/NRANKS)) with NRANKS per system, NS=<num-sockets>, and NC=<total-num-cores-per-system> which does not rely anymore on the deprecated -npernode flag.

alazzaro · 2024-06-19T07:43:58Z

Making the workload run with OpenMPI using --oversubscribe is wrong. It usually means the launcher does not know how many processing elements (PE) in OpenMPI slang are available on the target system. In the past (4.x series of oMPI), I used, e.g., --map-by slot:PE=42 with 42 the number of cores (even with SMT=on), and then -npernode plus total rank-count specified with mpirun. For the 5-series, I am using something like --map-by ppr:$(((NRANKS+NS-1)/NS)):package:PE=$((NC/NRANKS)) with NRANKS per system, NS=, and NC= which does not rely anymore on the deprecated -npernode flag.

In terms of performance, I agree with you, a proper binding is what we need. But in the context of the tests the goal is to check the correctness. I leave to @LecrisUT to decide what to do, the tests are running fine in our CI.

LecrisUT · 2024-06-19T07:50:19Z

Things are a bit muddy, because when building, the environment is very limited, and in the x86_64 case, there are only 2 cores available competing for both mpi and openmp resources. At least that's for upstream building in copr, downstream use a bit more resources so it fails less.

I am considering a refactor so that the testsuite looks like this making it more easy to run the testsuite on pre-installed libraries. Normally I would still prefer to do unit tests in the spec file, but in this case, these tests are the more expensive parts, so I'll think about it a bit.

hfp · 2024-06-19T07:52:15Z

I mean, if the rank-count is fixed and known and a lower number aka 2, one could use --map-by slot:PE=2 just to make it work (should be better than --oversubscribe given the system has at least two cores). If running on a single system, -npernode should be superfluous also.

alazzaro · 2024-06-19T07:53:23Z

Note that the most expensive tests we have are really serial, so likely I will go for a single rank...

LecrisUT · 2024-06-19T08:00:12Z

Note that the most expensive tests we have are really serial, so likely I will go for a single rank...

Technically yes, but because these use ctest parallelization, the overall time is less

I mean, if the rank-count is fixed and known and a lower number aka 2, one could use --map-by slot:PE=2 just to make it work (should be better than --oversubscribe given the system has at least two cores). If running on a single system, -npernode should be superfluous also.

It's not as clear cut because on aarch64 downstream, for example, a minimum of 12 core, sometimes 72 core builder is provided, and there most of the tests are run parallel within ctest, which can interfere with that option. Here is a downstream example.

Signed-off-by: Cristian Le <[email protected]>

LecrisUT mentioned this pull request Jan 30, 2024

feat: Add Fedora packaging via packit cp2k/cp2k#3256

Merged

LecrisUT force-pushed the feat/packit branch from 437b055 to 7c59b74 Compare January 30, 2024 17:48

LecrisUT mentioned this pull request May 17, 2024

Add CMake PROCESSORS properties #757

Merged

LecrisUT force-pushed the feat/packit branch from 7c59b74 to 1602599 Compare May 17, 2024 11:49

LecrisUT force-pushed the feat/packit branch from 273031d to 34d33bd Compare June 19, 2024 07:42

LecrisUT added 3 commits June 24, 2024 18:31

Add Fedora packaging via packit

cf29a89

Signed-off-by: Cristian Le <[email protected]>

Disable some linting tests

c9cc5aa

Signed-off-by: Cristian Le <[email protected]>

[Debug] Try to resolve openmpi test duration

7a6e261

Signed-off-by: Cristian Le <[email protected]>

LecrisUT force-pushed the feat/packit branch from 34d33bd to 7a6e261 Compare June 24, 2024 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Fedora packaging #759

feat: Fedora packaging #759

LecrisUT commented Jan 30, 2024 •

edited

Loading

jenkins-cscs commented Jan 30, 2024

LecrisUT commented Jan 30, 2024

dev-zero commented Feb 20, 2024

dev-zero commented Feb 20, 2024

LecrisUT commented Feb 20, 2024

dev-zero commented Feb 21, 2024 •

edited

Loading

dev-zero commented Feb 21, 2024

LecrisUT commented Feb 21, 2024

hfp commented May 16, 2024

LecrisUT commented May 16, 2024 •

edited

Loading

alazzaro commented May 16, 2024

oschuett commented May 16, 2024

LecrisUT commented May 16, 2024

LecrisUT commented May 16, 2024

alazzaro commented May 16, 2024

LecrisUT commented May 16, 2024

alazzaro commented May 16, 2024

hfp commented May 16, 2024

alazzaro commented May 16, 2024

alazzaro commented Jun 18, 2024

LecrisUT commented Jun 19, 2024

alazzaro commented Jun 19, 2024 •

edited

Loading

hfp commented Jun 19, 2024 •

edited

Loading

alazzaro commented Jun 19, 2024

LecrisUT commented Jun 19, 2024

hfp commented Jun 19, 2024

alazzaro commented Jun 19, 2024

LecrisUT commented Jun 19, 2024

feat: Fedora packaging #759

Are you sure you want to change the base?

feat: Fedora packaging #759

Conversation

LecrisUT commented Jan 30, 2024 • edited Loading

jenkins-cscs commented Jan 30, 2024

LecrisUT commented Jan 30, 2024

dev-zero commented Feb 20, 2024

dev-zero commented Feb 20, 2024

LecrisUT commented Feb 20, 2024

dev-zero commented Feb 21, 2024 • edited Loading

dev-zero commented Feb 21, 2024

LecrisUT commented Feb 21, 2024

hfp commented May 16, 2024

LecrisUT commented May 16, 2024 • edited Loading

alazzaro commented May 16, 2024

oschuett commented May 16, 2024

LecrisUT commented May 16, 2024

LecrisUT commented May 16, 2024

alazzaro commented May 16, 2024

LecrisUT commented May 16, 2024

alazzaro commented May 16, 2024

hfp commented May 16, 2024

alazzaro commented May 16, 2024

alazzaro commented Jun 18, 2024

LecrisUT commented Jun 19, 2024

alazzaro commented Jun 19, 2024 • edited Loading

hfp commented Jun 19, 2024 • edited Loading

alazzaro commented Jun 19, 2024

LecrisUT commented Jun 19, 2024

hfp commented Jun 19, 2024

alazzaro commented Jun 19, 2024

LecrisUT commented Jun 19, 2024

LecrisUT commented Jan 30, 2024 •

edited

Loading

dev-zero commented Feb 21, 2024 •

edited

Loading

LecrisUT commented May 16, 2024 •

edited

Loading

alazzaro commented Jun 19, 2024 •

edited

Loading

hfp commented Jun 19, 2024 •

edited

Loading