Skip to content

Phase 4 Planning

Matthew Hambley edited this page Jul 31, 2024 · 9 revisions

Planning Phase 4

The main thrust of phase 4 development is to understand what has been developed in phases 2 and 3 now that our contractors have moved on. This will include polishing that work to get it into a fit state for full deployment.

Understanding the Code

First order of business is to gain a better understanding of where we stand. This will require examining and comprehending every source file. I see two practical ways to approach this.

The first is to complete the job of type hinting everything. So far it has only been done piecemeal. In order to gain full benefit from static type analysis it is needed everywhere.

The other is to complete migrating anything of value from the old unit tests. This has the added advantage of tidying up dead code.

Both of these efforts could be performed file-at-a-time by the whole developer community, thus spreading the chance for learning around.

Beyond those simply working on the code is likely to give the best learning opportunities. So further development should be considered part of the process of understanding the code.

Enhanced Testing

Having looked at the existing unit testing it seems to rely over-much on "mocking." I worry that it is testing the implementation rather than the function of the units it is testing.

I think we would be well served by redeveloping the tests, and potentially the units being tested, for better (less implementation dependent) testing. I suspect we are also under covered in our testing so this would be an opportunity to add tests for corner cases, expected failure modes and all that good stuff.

This redevelopment also affords us further opportunity to learn the code base.

Finally there is the vexed question of the system tests. Currently these are largely end-to-end repeats of much of the unit testing. There is value in it since it tests the whole system but can we do better? In particular it would be good to have systems tests which exercised the framework against example build systems. This would require suitable targets which do not currently exist.

Enhanced Functionality

Although the basic functionality is present, as specified by the original project, there is still a lot missing. Some "nice to have features" and a whole lot of usability and finesse. Often the way features are currently implemented is clunky and incomplete. e.g. assumptions about the compiler in use.

Activities planned by BOM (high-level overview). I will add comments indicating the progress:

  • WIP** Work in progress, we are working on it, but are not done.
  • implemented means it's implemented in our fork, but not yet merged into the main repo.
  • merged means it has been merged.

Our next steps then are:

  • Generalisation.
    It must be possible to call PSyclone with different (and esp. no) API (atm the API is hardcoded). Esp when building LFRic with UM physics we will need to be able to run PSyclone with -api lfcric (old style:: -api dynamo0p3) and without an API to transform the UM physics (e.g. for GPU support):
    • implemented PR coming soon.
  • Support site-specific configurations.
    • wip We are basically done, but need to get some experience in what exactly site-specific config will need to set, and the best way of implementing this. In terms of the new LFRic build system Baf: there are command line options --platform and --site (which will default to the corresponding environment variables), which are used to find a site & platform - specific directory, from which a python script gets execited.
  • Support compiler version.
    Meaning as integer tuples that can be compared, not just as strings. E.g.: (1,2,3) < (1,4) etc).
    • wip Under code review.
  • Support MPI.
    A BuildConfig will specify if it's MPI or not, and if no explicit compiler is chosen, it will pick a compiler and linker that supports MPI or not)
    • wip Waiting for compiler version to be done (due to potential conflicts) and then code review, otherwise seems to work fine.
  • Support OpenMP.
    While (afaik) all compiler support OpenMP, the BuildConfig will specify if it should be used with or without OpenMP, and then the compiler object will automatically add in the compiler-specific option for OpenMP. A BuildConfig then only needs to specify if OpenMP is required or not.
    • wip As MPI (it's actually part of the same PR), so waiting for code review.
  • Better support for compiler wrappers
    E.g. mpif90 might be a wrapper around ifort, so both compilers should use the same flags, and it must be ensured that the wrapper and the wrapped compiler are the same (e.g. to make sure that if mpif90 wraps ifort version XXX, then that indeed version XXX is called when using mpif90)
  • Push linker-details into the site-specific configuration
    Aa Fab application script would only specify the name of the libraries requested. E.g. an app might be linked with ["xios", "netcdf"], which will then get translated to the site-specific list of -L and -l options automatically (based on the chosen compiler).
  • Improved support for compiler flags.
    Flags can grouped into modes ('fast-debug', 'release', ...), and the BuildConfig will specify which mode to use. A site-specific config can modify the pre-set flags for each compiler for each mode, define new modes (e.g. 'tau-profile'). Flag settings can be defined to be path-specific (add/remove options for certain directories or files), and also compiler-version specific. In the end, we (likely) should be able to (say) to disable OpenMP for file X/Y/Z when the intel compiler version is between this and that.
  • Using an object-oriented framework
    OO makes it easy for applications to re-use existing code, and modify settings. For example, if you want to apply different PSyclone optimisation scripts, you would inherit from the (say) gunghu fab build, and only re-define the function that determines which script is to be used.
  • Most settings will be command-line driven
    This will hopefully avoid the nested include logic in existing cylc build scripts. Example: $LFRIC_CORE/build.sh ./fab_lfric_atm.py --compiler intel-classic --mode fast-debug --site NCI --platform gadi --mpi --openmp. The --compiler flag will automatically select the whole suite, in this example ifort/icc, without the need to specify explicitly the compilers and linker (though this can be overwritten, allowing you to use a different say C compiler). We will likely support the traditional, environment variables based way of selecting compilers, but imho this makes the code way harder to understand (my preferred solution would be to use ... -cc "$CC" -cflags "$CFLAGS" etc - this makes it at least for a novice clear where the settings, that are often set in a module, come from).
  • Known/important bug fixes:
    We'll add the bugs that we intend to fix here:

This activity should include bug fixes so a trawl through the issues list is probably a good place to start.

The Shape of Success

What would a phase 4 release look like?

It should be suitable for not only building the UM and LFRic Atmosphere models but do so in a production appropriate way.

That is to say:

  • Full build of both projects. This includes unit and integration testing for LFRic.
  • Proven support for multiple compilers with easy switching between them.
  • Proven support for incremental builds.
  • Proven support for UM style pre-builds.
Clone this wiki locally