Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resume nightly testing of GDASApp #1313

Open
RussTreadon-NOAA opened this issue Oct 8, 2024 · 1 comment
Open

Resume nightly testing of GDASApp #1313

RussTreadon-NOAA opened this issue Oct 8, 2024 · 1 comment
Assignees

Comments

@RussTreadon-NOAA
Copy link
Contributor

Directory ci contains the following scripts

driver.sh  gw_driver.sh  hera.sh  orion.sh  run_ci.sh  run_gw_ci.sh  stable_driver.sh

along with directory validation.

stable_driver.sh was previously run via cron to

  • clone global-workflow develop
  • update jedi hashes in sorc/gdas.cd
  • build g-w with updated sorc/gdas.cd
  • run GDASApp ctests
  • if all tests Passed push updated sorc/gdas.cd to GDASApp feature/stable-nightly

The nightly cron was turned off after several failures.

This issue is opened to document the work needed to resume nightly testing of GDASApp.

@RussTreadon-NOAA
Copy link
Contributor Author

Set up working copy of ci directory in my space on Hera. Turn off mail to Cory and Guillaume. Execute stable_driver.sh. Everything ran fine up to

+ ctest -R gdasapp --output-on-failure

Some of the queued tests failed to run within 1500 seconds of being submitted. Downstream dependent jobs failed.

The following tests FAILED:
        1953 - test_gdasapp_WCDA-3DVAR-C48mx500_gdasstage_ic_202103241200 (Timeout)
        1954 - test_gdasapp_WCDA-3DVAR-C48mx500_gdasfcst_202103241200 (Timeout)
        1955 - test_gdasapp_WCDA-3DVAR-C48mx500_gdasprepoceanobs_202103241800 (Timeout)
        1956 - test_gdasapp_WCDA-3DVAR-C48mx500_gdasmarinebmat_202103241800 (Timeout)
        1957 - test_gdasapp_WCDA-3DVAR-C48mx500_gdasmarineanlinit_202103241800 (Failed)
        1958 - test_gdasapp_WCDA-3DVAR-C48mx500_gdasmarineanlvar_202103241800 (Failed)
        1959 - test_gdasapp_WCDA-3DVAR-C48mx500_gdasmarineanlchkpt_202103241800 (Failed)
        1960 - test_gdasapp_WCDA-3DVAR-C48mx500_gdasmarineanlfinal_202103241800 (Failed)
        1966 - test_gdasapp_atm_jjob_var_inc (Failed)
        1967 - test_gdasapp_atm_jjob_var_final (Failed)
        1973 - test_gdasapp_atm_jjob_ens_inc (Failed)
        1974 - test_gdasapp_atm_jjob_ens_final (Failed)

As a result the ctests returned a non-zero return code and the working copy of develop with update jedi hashes was not pushed to feature/stable-nightly.

We need a more robust mechanism to run the ctests. One could submit all the jobs to the debug queue. A potential problem here is that Hera only allows two debug jobs in the queue at a time for a user. stable_driver.sh sequentially runs ctests so this is not an issue. However, if the user were running other debug jobs there could potentially be problems.

As a test set WORKFLOW_BUILD=OFF prior to build. ctests successfully ran 24 non-workflow tests. The following git commands in stable_driver.sh worked

++ cat log.ctest
++ grep 'tests passed'
+ npassed='100% tests passed, 0 tests failed out of 24'
+ '[' 0 -eq 0 ']'
+ echo 'Tests:                                 *SUCCESS*'
++ date
+ echo 'Tests: Completed at Fri Oct  4 02:11:29 UTC 2024'
+ echo 'Tests: 100% tests passed, 0 tests failed out of 24'
+ echo '```'
+ exit 0
+ ci_status=0
+ total=0
+ '[' 0 -eq 0 ']'
+ cd /scratch1/NCEPDEV/da/Russ.Treadon/CI/GDASApp/stable/20241004/global-workflow/sorc/gdas.cd
+ git stash
No local changes to save
+ total=0
+ '[' 0 -ne 0 ']'
+ git checkout feature/stable-nightly
warning: unable to rmdir 'sorc/bufr-query': Directory not empty
warning: unable to rmdir 'sorc/da-utils': Directory not empty
Switched to a new branch 'feature/stable-nightly'
M       parm/jcb-algorithms
M       parm/jcb-gdas
M       sorc/fv3-jedi
M       sorc/ioda
M       sorc/iodaconv
M       sorc/jcb
M       sorc/oops
M       sorc/saber
M       sorc/soca
M       sorc/ufo
M       sorc/vader
branch 'feature/stable-nightly' set up to track 'origin/feature/stable-nightly'.
+ total=0
+ '[' 0 -ne 0 ']'

The next git command, git merge develop, failed with

+ git merge develop
Note: Fast-forwarding submodule sorc/fv3-jedi to 731fcf4cbf541f37ac0531b2504fcc4108e1f6ee
Failed to merge submodule sorc/oops (commits don't follow merge-base)
CONFLICT (submodule): Merge conflict in sorc/oops
Recursive merging with submodules currently only supports trivial cases.
Please manually handle the merging of each conflicted submodule.
This can be accomplished with the following steps:
 - go to submodule (sorc/oops), and either merge commit e6485c0a
   or update to an existing commit which has merged those changes
 - come back to superproject and run:

      git add sorc/oops

   to record the above merge or update
 - resolve any other conflicts in the superproject
 - commit the resulting index in the superproject
Automatic merge failed; fix conflicts and then commit the result.
+ total=1
+ '[' 1 -ne 0 ']'
+ echo 'Unable to merge develop'

@DavidNew-NOAA DavidNew-NOAA self-assigned this Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants