Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Fedora packaging #759

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open

feat: Fedora packaging #759

wants to merge 3 commits into from

Conversation

LecrisUT
Copy link
Contributor

@LecrisUT LecrisUT commented Jan 30, 2024

This adds packaging CI/CD for Fedora:

  • When a new release is pushed, a pull-request is created at https://src.fedoraproject.org/rpms/dbcsr
  • For each PR/commit to develop, the packaging is tested in copr
  • (not yet implemented) Tests for packaging imports, e.g. via find_package. Needs some refactoring similar to spglib

To fully work the packit app needs to be installed on the repo. In the meantime the example run can be seen on my fork: LecrisUT#1

soft-depends-on: #757

@jenkins-cscs
Copy link
Collaborator

Can one of the admins verify this patch?

@LecrisUT
Copy link
Contributor Author

Note that this CI/CD already highlights some issues: the current test builder is significantly slower on open-mpi vs mpich, and both are slower than serial. Even with #757 we see such problems

@dev-zero
Copy link
Contributor

@LecrisUT thanks a lot for the work, I like it.
Quick question: what is the the cmake build type? I believe we configure only Release, Debug and Coverage (custom one) in cmake/CompilerConfiguration.cmake but neglected RelWithDebInfo so far, which may be popular with distros.

@dev-zero
Copy link
Contributor

Also, one of the reasons to carry our own FindBLAS.cmake (FindLAPACK.cmake is only there since it pulls in FindBLAS.cmake) is the correct detection and selection of OpenMP-enabled OpenBLAS on some systems.
Do you know how that transfers over to flexiblas?

@LecrisUT
Copy link
Contributor Author

Quick question: what is the the cmake build type? I believe we configure only Release, Debug and Coverage (custom one) in cmake/CompilerConfiguration.cmake

Good question, it's RelWithDebInfo. But regarding that design, please do not do it like that. Most of these options are meant for the user (in this case distro package) to control. Use target_compile_options instead with generator expression. But in most cases have faith that cmake defines it properly or if() it around a cmake option.

Also, one of the reasons to carry our own FindBLAS.cmake (FindLAPACK.cmake is only there since it pulls in FindBLAS.cmake) is the correct detection and selection of OpenMP-enabled OpenBLAS on some systems.

That one I did not research. Can you elaborate on it? I will port it upstream and propose a more portable approach if it's needed.

Do you know how that transfers over to flexiblas?

That's straightforward, just have the user (or in a preset) define the blas vendor. On the fedora script I don't do it because it's the only vendor available. Not sure about the nuance with openmp enabled though.

@dev-zero
Copy link
Contributor

dev-zero commented Feb 21, 2024

Quick question: what is the the cmake build type? I believe we configure only Release, Debug and Coverage (custom one) in cmake/CompilerConfiguration.cmake

Good question, it's RelWithDebInfo. But regarding that design, please do not do it like that. Most of these options are meant for the user (in this case distro package) to control. Use target_compile_options instead with generator expression. But in most cases have faith that cmake defines it properly or if() it around a cmake option.

Well, at the time when we started, this was the best/easiest way to get consistent flag configuration across multiple targets.
Do you have a suggestion on how to do this without repeating it many times? add_compile_options on the top-level dir?

Also, one of the reasons to carry our own FindBLAS.cmake (FindLAPACK.cmake is only there since it pulls in FindBLAS.cmake) is the correct detection and selection of OpenMP-enabled OpenBLAS on some systems.

That one I did not research. Can you elaborate on it? I will port it upstream and propose a more portable approach if it's needed.

Sure, the best documentation is the history of our FindBLAS.cmake:
https://github.com/cp2k/dbcsr/commits/develop/cmake/FindBLAS.cmake

In particular this is the diff against FindBLAS.cmake:
f9dca07

The point is that CP2K (and DBCSR by extension) calls BLAS/LAPACK routines from OpenMP parallel regions (and also from non-OpenMP parallel regions). Hence we need OpenBLAS to be built with OpenMP. FindBLAS.cmake at the point of import did automatically use the pthreaded OpenBLAS, but not the OpenMP one.
What the patch does is extending the list of OpenBLAS names to look for by openblas_openmp, and checking the detected openblas library for proper threading support.

To submit it to CMake I thought about adding a component to explicitly request openmp or pthread components.
On the other hand, one might want to make it more general since MKL for example has only sequential and threaded (same for cray-libsci).
And there is a distinction to made between threaded and thread-safe.

Do you know how that transfers over to flexiblas?

That's straightforward, just have the user (or in a preset) define the blas vendor. On the fedora script I don't do it because it's the only vendor available. Not sure about the nuance with openmp enabled though.

This then requires that we have a runtime check on the flexiblas provider to ensure people are not passing along a non-thread-compatible BLAS/LAPACK implementation. Essentially what we have in the FindBLAS.cmake patch, just for flexiblas.

@dev-zero
Copy link
Contributor

fyi, not approving the workflows since the changes in this PR would not be covered by the tests (yet)

@LecrisUT
Copy link
Contributor Author

BTW, this is a Fedora packaging PR, we should discuss it on a CMake modernization one/issue. If you can open an issue can copy the discussion, we can continue there


Do you have a suggestion on how to do this without repeating it many times? add_compile_options on the top-level dir?

It depends on where it is used. You can define a DBCSR_Base target.

The point is that CP2K (and DBCSR by extension) calls BLAS/LAPACK routines from OpenMP parallel regions (and also from non-OpenMP parallel regions). Hence we need OpenBLAS to be built with OpenMP. FindBLAS.cmake at the point of import did automatically use the pthreaded OpenBLAS, but not the OpenMP one.
What the patch does is extending the list of OpenBLAS names to look for by openblas_openmp, and checking the detected openblas library for proper threading support.

To submit it to CMake I thought about adding a component to explicitly request openmp or pthread components.
On the other hand, one might want to make it more general since MKL for example has only sequential and threaded (same for cray-libsci).
And there is a distinction to made between threaded and thread-safe.

Interesting, I will need some time to digest this, probably after the CECAM conference.

This then requires that we have a runtime check on the flexiblas provider to ensure people are not passing along a non-thread-compatible BLAS/LAPACK implementation. Essentially what we have in the FindBLAS.cmake patch, just for flexiblas.

Flexiblas is runtime selecting the BLAS/LAPACK. Maybe we can check if we can impose static restrictions there

@hfp
Copy link
Member

hfp commented May 16, 2024

If this PR targets the next release of CP2K/DBCSR, it would be the right time to finish/conclude it.

@LecrisUT
Copy link
Contributor Author

LecrisUT commented May 16, 2024

Well one design question, do we want to implement the same structure as cp2k/cp2k repo? In either case this PR is finished and ready for review as-is. The only tasks to be done are on the project owner side to enable packit.

@alazzaro
Copy link
Member

Sorry for the slow response, I will see if I have time to review it in June... In any case, DBCSR is tested in CP2K, where the machinery is already in place, so I don't think there is any rush...

@oschuett
Copy link
Member

The only tasks to be done are on the project owner side to enable packit.

I've already enabled the Packit-as-a-Service app for this repo.

@LecrisUT
Copy link
Contributor Author

The testing here with packit and the CI in CP2K are completely different in subtle ways. Packit testing in cp2k never picks up the changes here, and the changes here never check that packaging and changes in Fedora rawhide. The former are accounted for in the cp2k testing, but the latter should be addressed with this approach as I've mentioned some of the subtleties in the cp2k PR for this.

I would say the only issue here is, since cp2k has adopted a different integration approach here, do we wish to mirror that integration here? @oschuett opinion?

@LecrisUT
Copy link
Contributor Author

/packit build

@alazzaro
Copy link
Member

@LecrisUT sorry, I was not clear in my message... As I said, I will take care of your changes. We do not need to follow any CP2K repo rule here (DBCSR is a much smaller project, at some point it will become standalone in CP2K).
What I'm saying here is that there are no plans to make a new release of DBCSR beyond the current RC in CP2K, which we are already testing it.
Then, of course, I agree with you that we need to merge this PR... Please, consider DBCSR as an independent project.

@LecrisUT
Copy link
Contributor Author

Please, consider DBCSR as an independent project.

Of course, I was not assuming any dependencies, and I will be leaving that design issue to upstream. Packit does not have the tools yet to do reverse dependency testing either.

As for the other comment, the design question still stands and I am in favor of consistency.

I would say the only issue here is, since cp2k has adopted a different integration approach here, do we wish to mirror that integration here? @oschuett opinion?

One softer design question there would be if the CI could be run more freely in this project's case

@alazzaro
Copy link
Member

One softer design question there would be if the CI could be run more freely in this project's case

I think so, CI here is github with very low new additions...
And yes, we can reuse what you did in CP2K. Thanks for your contributions. I will be back to you in June. I see the checks are running and some are already green :)

@hfp
Copy link
Member

hfp commented May 16, 2024

This is unrelated, sorry...

"there are no plans to make a new release of DBCSR beyond the current RC in CP2K"

It would be nice to pick-up DBCSR's changes since the last RC. I changed some common bits in OpenCL which are necessary for CP2K/DBM status. In any case, I think the OpenCL BE might become separate as well (not the DBCSR kernel/s however). For the latter, I would perhaps change the license to BSD-3.

@alazzaro
Copy link
Member

This is unrelated, sorry...

"there are no plans to make a new release of DBCSR beyond the current RC in CP2K"

It would be nice to pick-up DBCSR's changes since the last RC. I changed some common bits in OpenCL which are necessary for CP2K/DBM status. In any case, I think the OpenCL BE might become separate as well (not the DBCSR kernel/s however). For the latter, I would perhaps change the license to BSD-3.

@hfp We had a long discussion and preparation time ago on making DBCSR standalone. There is a ticket for that in CP2K, back to 2019, and the discussion was repeated in several other places during the cmake activity. Now I see @oschuett is moving CP2K to cmake, so I think it is good time for CP2K to use DBCSR as a standalone library. Then, it will be much easier to provide new DBCSR as needed.

@alazzaro
Copy link
Member

/packit build

@LecrisUT
Copy link
Contributor Author

The tests taking 2136.78 on openmpi vs 209.08 on mpich or 80.11 on serial is fairly suspicious. The tests are oversubscribed, but still the discrepancy is too high. I'll disable the tests for now, and come back with a PR to make it possible to run the testsuite externally

@alazzaro
Copy link
Member

alazzaro commented Jun 19, 2024

I'm checking this output: https://download.copr.fedorainfracloud.org/results/lecris/dbcsr/fedora-rawhide-x86_64/07625725-dbcsr/builder-live.log.gz

I see that the test which takes most of the time is:

  • Serial case
      Start 11: dbcsr_unittest1
11/17 Test #11: dbcsr_unittest1 .......................................   Passed   32.63 sec
  • MPICH case
      Start 11: dbcsr_unittest1
11/19 Test #11: dbcsr_unittest1 .......................................***Timeout 1500.00 sec

I can try to split this test in multiples, but clearly it doesn't solve the problem...

In our CI we do run with the following options:

-DMPIEXEC_PREFLAGS="$([ "${{ matrix.mpi_suffix }}" = "openmpi" ] && echo "-mca btl ^openib --allow-run-as-root --oversubscribe")" \

According to the OpenMPI documentation, the --oversubscribe does bind to none. Can we try this option?

@hfp
Copy link
Member

hfp commented Jun 19, 2024

Making the workload run with OpenMPI using --oversubscribe is wrong. It usually means the launcher does not know how many processing elements (PE) in OpenMPI slang are available on the target system. In the past (4.x series of oMPI), I used, e.g., --map-by slot:PE=42 with 42 the number of cores (even with SMT=on), and then -npernode plus total rank-count specified with mpirun. For the 5-series, I am using something like --map-by ppr:$(((NRANKS+NS-1)/NS)):package:PE=$((NC/NRANKS)) with NRANKS per system, NS=<num-sockets>, and NC=<total-num-cores-per-system> which does not rely anymore on the deprecated -npernode flag.

@alazzaro
Copy link
Member

Making the workload run with OpenMPI using --oversubscribe is wrong. It usually means the launcher does not know how many processing elements (PE) in OpenMPI slang are available on the target system. In the past (4.x series of oMPI), I used, e.g., --map-by slot:PE=42 with 42 the number of cores (even with SMT=on), and then -npernode plus total rank-count specified with mpirun. For the 5-series, I am using something like --map-by ppr:$(((NRANKS+NS-1)/NS)):package:PE=$((NC/NRANKS)) with NRANKS per system, NS=, and NC= which does not rely anymore on the deprecated -npernode flag.

In terms of performance, I agree with you, a proper binding is what we need. But in the context of the tests the goal is to check the correctness. I leave to @LecrisUT to decide what to do, the tests are running fine in our CI.

@LecrisUT
Copy link
Contributor Author

Things are a bit muddy, because when building, the environment is very limited, and in the x86_64 case, there are only 2 cores available competing for both mpi and openmp resources. At least that's for upstream building in copr, downstream use a bit more resources so it fails less.

I am considering a refactor so that the testsuite looks like this making it more easy to run the testsuite on pre-installed libraries. Normally I would still prefer to do unit tests in the spec file, but in this case, these tests are the more expensive parts, so I'll think about it a bit.

@hfp
Copy link
Member

hfp commented Jun 19, 2024

I mean, if the rank-count is fixed and known and a lower number aka 2, one could use --map-by slot:PE=2 just to make it work (should be better than --oversubscribe given the system has at least two cores). If running on a single system, -npernode should be superfluous also.

@alazzaro
Copy link
Member

Note that the most expensive tests we have are really serial, so likely I will go for a single rank...

@LecrisUT
Copy link
Contributor Author

Note that the most expensive tests we have are really serial, so likely I will go for a single rank...

Technically yes, but because these use ctest parallelization, the overall time is less

I mean, if the rank-count is fixed and known and a lower number aka 2, one could use --map-by slot:PE=2 just to make it work (should be better than --oversubscribe given the system has at least two cores). If running on a single system, -npernode should be superfluous also.

It's not as clear cut because on aarch64 downstream, for example, a minimum of 12 core, sometimes 72 core builder is provided, and there most of the tests are run parallel within ctest, which can interfere with that option. Here is a downstream example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants