-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Make get_obs
tasks day-dependent in workflow; other improvements and bug fixes
#1137
base: develop
Are you sure you want to change the base?
[develop] Make get_obs
tasks day-dependent in workflow; other improvements and bug fixes
#1137
Conversation
…the tar file where the prepbufr files live changed"
…y Michelle Harrold, solution by Michael Kavulich.
…ntStat tasks' METplus log files.
…ing cycles for CCPA and MRMS but not yet for NDAS or NOHRSC.
…thout performing unnecessary repeated pulls.
… they're per-cycle or per-day.
…nup and comments.
…files from HPSS (and works with multiple cycles).
…e cleanup is happening.
…les, that are expected to be created once the task is finished actually get created. This is needed because it is possible that for some forecast hours for which there is overlap between cycles, the files are being retrieved and processed by the get_obs_... task for another cycle.
…nd EnsembleStat tasks such that GenEnsProd does not depend on the completion of get_obs_... tasks (because it doesn't need observations) but only forecast output while EnsembleStat does.
…d due to changes to dependencies of GenEnsProd tasks in previous commit(s).
…tending to time out for 48-hour forecasts.
…sure PcpCombine operates only on those hours unique to the cycle, i.e. for those times starting from the initial time of the cycle to just before the initial time of the next cycle. For the PcpCombine_obs task for the last cycle, allow it to operate on all hours of that cycle's forecast. This ensures that the PcpCombine tasks for the various cycles do not clobber each other's output. Accordingly, change the dependencies of downstream tasks that depend on PcpCombine obs output to make sure they include all PcpCombine_obs tasks that cover the forecast period of the that downstream task's cycle.
…ossibly also get_obs_ndas by putting in sleep commands.
@MichaelLueken @gspetro I found the documentation for how to make documentation here, so never mind! |
@gsketefian The documentation is also generated via Read the Docs automatically as a GHA. If you scroll to the bottom of the PR and click the |
@MichaelLueken @gspetro Do you know if sphinx available on Hera? I could not find a system version, but maybe someone has a personal version somewhere I could use. It would make it easier to build and view the results of changes to the documentation on the spot instead of having to push it to github and then wait for the auto build. |
Hi @MichaelLueken , I just saw that the python test for Also, for the data that needs to be staged for the new WE2E tests, do we do that before trying to run the tests or after? I mean, can we run the tests using the data in my directories for now (maybe that's what is automatically happening since that's where the test config files point to right now)? |
Hi @gsketefian, I'll try and see what might be happening with the generate_FV3LAM_wflow.py failure. Before final testing, the data for the new tests should be staged on Tier-1 platforms. Preliminary testing can be done with the data in your directories. As you noted, since the configuration files for the new WE2E tests are pointing to your directories, this is what is automatically happening currently. |
@gsketefian While attempting to run the
In |
…m the "platform" to the "verification" section to be consistent with the changes in config_defaults.yaml.
@MichaelLueken I fixed |
@gsketefian The last step before merging is adding the run_we2e_coverage_tests label to the PR, which will kick off the automated Jenkins tests. This is done after two approvals have been given. |
…mulative fields, not obs days for instantaneous fields (which is the default cycledef in verify_pre.yaml).
…and use it as the forecast output interval when performing vx.
…tiple times by different functions.
…le start times multiple times.
… and corresponding adjustments to them to be effective (i.e. in order for any necessary adjustments to make it into the rocoto xml file), move the call to the function that performs these checks and adjustments to a place BEFORE the call to extend_yaml() that "freezes" (hard-codes) the accumulations for which the PcpCombine and other tasks are run (this freezing should happen AFTER any adjustments are made to the list of user-specified accumulations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long review...I feel like I always drop these on a Friday evening, sorry about that!
Most of my review is trying to cut down the size of the very large scripts as much as possible (mostly from consolidating unnecessary variables). Also some cleanup suggestions from pylint
. The one major point that I'd like addressed is about the calling of python scripts via a system call rather than import, but we already have examples in the unit tests so that shouldn't be too hard.
I'll also be opening a PR into your branch with the changes from the smoke verification branch that I mentioned below.
@@ -160,7 +159,13 @@ def run_we2e_tests(homedir, args) -> None: | |||
# test-specific options, then write resulting complete config.yaml | |||
starttime = datetime.now() | |||
starttime_string = starttime.strftime("%Y%m%d%H%M%S") | |||
test_name = os.path.basename(test).split('.')[1] | |||
test_fn = os.path.basename(test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason you wanted to stray from the existing convention for test filenames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original naming convention allowed dots only right after the initial "config" and immediately before the "yaml" extension. For long test names, I wanted to have another separator character besides the underscore to make the test name more readable. For example, the new test config file
config.get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_00z_fcstlen_03hr.ncep-hrrr.yaml
was originally named
config.get_obs_hpss_do_vx_det_multicyc_cycintvl_24hr_inits_00z_fcstlen_03hr_ncep-hrrr.yaml
which I thought was quite confusing to read/understand. So I wanted to have another separator character for the major portions of the test name. The dot seemed the most appropriate, but you can see I've also used a dash. I'm open to suggestions for another naming convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is trying to cram a whole bunch of info about a test in the filename, which is always going to lead to a mess. I acknowledge you didn't start this problem (our existing test names aren't great either) but I think a better solution than making things even more unwieldy is to only include essential disambiguating information in the filename. We have the "description" field that can give more details, and we can grep config options to find out tests that have particular options enabled/disabled.
I pushed one possible change here; it includes the minimum info needed to tell the tests apart. I really think we could just go as far as vx_test_1
, vx_test_2
, etc. and have all the test info in the description, but that would have to be a larger conversation.
Since it doesn't show up well in the diffs apparently, these are the suggested renames:
- config.get_obs_hpss.do_vx_det.singlecyc.init_00z_fcstlen_36hr.winter_wx.SRW.yaml --> config.vx-det_long-fcst_winter-wx_SRW-staged.yaml
- config.get_obs_hpss.do_vx_det.multicyc.cycintvl_07hr_inits_vary_fcstlen_09hr.ncep-hrrr.yaml --> config.vx-det_multicyc_fcst-overlap_ncep-hrrr.yaml
- config.get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_00z_fcstlen_03hr.ncep-hrrr.yaml --> config.vx-det_multicyc_first-obs-00z_ncep-hrrr.yaml
- config.get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_21z_fcstlen_03hr.ncep-hrrr.yaml --> config.vx-det_multicyc_last-obs-00z_ncep-hrrr.yaml
- config.get_obs_hpss.do_vx_det.multicyc.cycintvl_96hr_inits_12z_fcstlen_48hr.nssl-mpas.yaml --> config.vx-det_multicyc_long-fcst-no-overlap_nssl-mpas.yaml
- config.get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_12z_fcstlen_48hr.nssl-mpas.yaml --> config.vx-det_multicyc_long-fcst-overlap_nssl-mpas.yaml
- config.get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_12z_fcstlen_03hr.nssl-mpas.yaml --> config.vx-det_multicyc_no-00z-obs_nssl-mpas.yaml
- config.get_obs_hpss.do_vx_det.multicyc.cycintvl_11hr_inits_vary_fcstlen_03hr.ncep-hrrr.yaml --> config.vx-det_multicyc_no-fcst-overlap_ncep-hrrr.yaml
… files can come from sources other than NDAS (e.g. GDAS).
@mkavulich Thanks for reviewing this PR. I realize it's a non-trivial expenditure of your time. I think I addressed all the comments, accepting almost all of them. I haven't yet merged your PR into my fork, but I'm going one step at a time. Could you take a look at the latest version and let me know your thoughts? I've tested it with 3 WE2E tests, and they all passed. After I get your responses to the changes, I will merge in your PR and also commit documentation changes that I still need to finish. Then I'll run a more comprehensive set of tests. Thanks. |
@gsketefian I am not seeing your latest changes (this is the only recent commit), did you commit/push them all? |
Oops, forgot to commit some of them before pushing. I think they're all there now. |
--obs_day ${PDY}" | ||
print_info_msg " | ||
CALLING: ${cmd}" | ||
${cmd} || print_err_msg_exit "Error calling ${script_bn}.py." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
${cmd} || print_err_msg_exit "Error calling ${script_bn}.py." | |
${cmd} || print_err_msg_exit "Error calling get_obs.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@gsketefian Just had one more suggested change and a couple replies, otherwise just waiting for the PR to your branch; we could take the conversation over there to keep this one less cluttered if you like |
DESCRIPTION OF CHANGES:
This PR fixes multiple bugs in the verification (vx) and other parts of the SRW App, the main one being that the
get_obs
tasks as well as some of the vx pre-processing tasks currently do not work for an experiment with multiple cycles if those cycles overlap in time (bug discovered by @michelleharrold and @willmayfield). Fixes and changes made by this PR are described in more detail below.Changes related to
get_obs
tasks:Make
get_obs
tasks in the ROCOTO workflow obs-day-based as opposed to cycle-based. Thus, for each day for which obs are needed for vx (and for each obs type that is needed for vx), there is now aget_obs
workflow task.Move the functionality in the ex-script
exregional_get_verif_obs.sh
to the new python scriptget_obs.py
. The newexregional_get_verif_obs.sh
is now a very short script that just callsget_obs.py
.The new
get_obs.py
script, along with changes tosetup.py
to calculate the times at which various types of obs need to be retrieved, ensure that no clobbering of retrieved obs files occurs (this currently does occur if cycles overlap).In
config_defaults.yaml
, introduce new variables specifying the obs availability interval for each of the four obs types (CCPA, NOHRSC, MRMS, NDAS) that might be retrieved. These variables are[CCPA|NOHRSC|MRMS|NDAS]_OBS_AVAIL_INTVL_HRS
.setup.py
now checks that multiple consistency constraints and requirements on the temporal vx parameters in the SRW configuration file (e.g. the accumulation periods, the obs availability intervals, the forecast output interval) are satisfied that would otherwise cause errors in the workflow. (setup.py
calls functions in the (renamed) scriptset_cycle_and_obs_timeinfo.py
to run these checks.) If such inconsistencies exist, the parameters are either adjusted to fix them or, if that is not possible, the experiment generation process is stopped.In
config_defaults.yaml
, introduce flags that determine whether or not to delete the raw obs directories and files that theget_obs
tasks create after the raw obs have been copied/moved/renamed to their final/processed locations. These new flags areREMOVE_RAW_OBS_[CCPA|NOHRSC|MRMS|NDAS]
.In
config_defaults.yaml
, move the base directories for the obs, i.e.[CCPA|NOHRSC|MRMS|NDAS]_OBS_DIR
, from theplatform
section to theverification
section so that they are near the METplus obs file name templates (OBS_...FN_TEMPLATE
) for which they serve as base directories.The processed/final files that the
get_obs
tasks create are now located and named as specified by the combination of the obs base directory (e.g.CCPA_OBS_DIR
) and the obs file name template (e.g.CCPA_APCP_FN_TEMPLATE
). Currently, the processed/final file that theget_obs
tasks first look for are, say for CCPA,{CCPA_OBS_DIR}/{CCPA_APCP_FN_TEMPLATE}
, but if these files don't exist and the obs need to be retrieved, the retrieved and processed file names are not necessarily given by this template. With this PR, the raw files are renamed and moved after retrieval to ensure they are located at{..._OBS_DIR}/{..._FN_TEMPLATE}
.Retrieve only 6-hourly NOHRSC snow accumulation obs, not 24-hourly accumulations. Currently, 24-hour accumulated obs are also retrieved (although there doesn't seem to be a WE2E test for it).
Modify the configuration file
parm/data_locations.yml
forretrieve_data.py
to extract all files in an archive at a time (i.e. per call toretrieve_data.py
) instead of extracting only one obs file out of an archive for each call toretrieve_data.py
. This speeds up the data retrieval significantly since a large portion of theget_obs
tasks' wallclock time is spent establishing a connection to HPSS.Modify
parm/data_locations.yml
to account for the change in prebpufr (NDAS) obs file names on May 22, 2024. This is currently causingget_obs_ndas
tasks to fail for cycles at or after this date. (Bug found by @michelleharrold.)Fix vx task dependencies to work with new obs-day-based
get_obs
tasks. Now, allget_obs
tasks (i.e. for all obs days) for a given obs type must be complete before any vx tasks for that obs type can launch. This doesn't cause any significant delay because theget_obs
tasks run in parallel and get at most one day's worth of obs.Changes related to vx pre-processing tasks (
PcpCombine_obs
andPb2nc_obs
):Add
PcpCombine_obs
tasks for both 6-hour and 24-hour accumulations of NOHRSC obs. The one for 6-hour accumulation simply converts the grib2 obs files to NetCDF, while the one for 24-hour accumulation adds the 6-hour grib2 obs to obtain a NetCDF file for 24-hour obs accumulations.Place all output from
PcpCombine_obs
tasks (both for CCPA and NOHRSC) under the cycle directories, just as is done for the analogousPcpCombine_fcst
tasks for forecasts. This is because accumulations, even for obs, depend on the start time of the cycle, e.g. 6-hour CCPA accumulations needed to verify a set of forecasts that start at 00Z will be different than 6-hour CCPA accumulations needed to verify a set of forecasts that start at 03Z. (Currently, the output files from these tasks are placed in themetprd
directory under the main experiment directory without consideration for the start times of the accumulations.)Make the
Pb2nc_obs
task for NDAS obs-day-dependent (unlike thePcpCombine_obs
tasks, which are cycle-dependent). This can be done because unlike accumulations, the result of thePb2nc_obs
task does not depend on the starting time of the cycles; it only depends on a given valid time. Also, keep the output of thePb2nc_obs
task in the cycle-independent directorymetprd
directly under the main experiment directory.Small, self-contained bug fixes and improvements:
Move evaluation of METplus time strings out of what used to be
set_vx_fhr_list.sh
(now renamed toset_leadhrs.sh
) and into a new bash script (bash_utils/eval_METplus_timestr_tmpl.sh
) to make it easier to change this functionality to python later on.Allow WE2E test names to include dots since dots (like underscores) are handy to use as separators in the test name.
Add the two new SRW config parameters
VX_CONFIG_[DET|ENS]_FN
inconfig_defaults.yaml
that specify the yaml configuration files to use for deterministic and ensemble verification. The default values for these are the filesvx_config_[det|ens]_fn.yaml
inparm/metplus
. These parameters allow a user to specify other user-created yaml files in this directory to use for the vx configuration so that the default files, which are under version control, do not have to be changed.Change some metatask and task names for clarity and consistency.
Add an option to
mrms_pull_topofhour.py
to not assume that there is a valid-date subdirectory under the specified source directory and to not add such a subdirectory under the specified output directory when generating output. This is handy when calling this script from the newget_obs.py
script.Fix bug in
parm/wflow/verify_det.yaml
so that all tasks have acycldefs
statement by default. This bug was causingGridStat
workflow tasks forCCPA
andNOHRSC
obs to be created for cycles not defined for the workflow (these extraneous cycles probably correspond to the default set of cycles that a task gets assigned by ROCOTO when it does not contain an explicitcycledefs
statement). (Bug found by @michelleharrold, solution by @mkavulich.)Fix bug in
scripts/exregional_run_met_gridstat_or_pointstat.sh
to append a string for the cycle date ("_YYYYMDDHH") to the name of the metplus log file for deterministicGridStat
andPointStat
tasks. This was causing the metplus log file for GridStat for a given cycle tasks to be overwritten by those for other cycles. (Bug found by @michelleharrold.)Fix bug in
parm/default_workflow.yaml
in "cycled_from_second" section in which the starting YYYYMMDDHH value of the cycledef can contain an HH value that is larger than 23. This currently happens because this HH is obtained directly fromINCR_CYCL_FREQ
without checking whether that value is less than 24.Fix bug In
launch_FV3LAM_wflow.sh
to change double quotes to single quotes to prevent failure in the interpretation of the command by cron.New WE2E tests added:
The following new WE2E tests were added to the
verification
subdirectory undertest_configs
to test various aspects of the new code:get_obs_hpss.do_vx_det.multicyc.cycintvl_07hr_inits_vary_fcstlen_09hr.ncep-hrrr
get_obs_hpss.do_vx_det.multicyc.cycintvl_11hr_inits_vary_fcstlen_03hr.ncep-hrrr
get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_00z_fcstlen_03hr.ncep-hrrr
get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_12z_fcstlen_03hr.nssl-mpas
get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_12z_fcstlen_48hr.nssl-mpas
get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_21z_fcstlen_03hr.ncep-hrrr
get_obs_hpss.do_vx_det.multicyc.cycintvl_96hr_inits_12z_fcstlen_48hr.nssl-mpas
get_obs_hpss.do_vx_det.singlecyc.init_00z_fcstlen_36hr.winter_wx.SRW
The purpose of each of these is described in the
description
section of the corresponding test config file.Type of change
TESTS CONDUCTED:
Three sets of WE2E tests were conducted on Hera/intel:
The "fundamental" suite consisting of:
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0
The existing verification tests, consisting of:
MET_ensemble_verification_only_vx
MET_ensemble_verification_only_vx_time_lag
MET_ensemble_verification_winter_wx
MET_verification
MET_verification_only_vx
MET_verification_winter_wx
The newly added get_obs/verification tests, consisting of:
get_obs_hpss.do_vx_det.multicyc.cycintvl_07hr_inits_vary_fcstlen_09hr.ncep-hrrr
get_obs_hpss.do_vx_det.multicyc.cycintvl_11hr_inits_vary_fcstlen_03hr.ncep-hrrr
get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_00z_fcstlen_03hr.ncep-hrrr
get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_12z_fcstlen_03hr.nssl-mpas
get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_12z_fcstlen_48hr.nssl-mpas
get_obs_hpss.do_vx_det.multicyc.cycintvl_24hr_inits_21z_fcstlen_03hr.ncep-hrrr
get_obs_hpss.do_vx_det.multicyc.cycintvl_96hr_inits_12z_fcstlen_48hr.nssl-mpas
get_obs_hpss.do_vx_det.singlecyc.init_00z_fcstlen_36hr.winter_wx.SRW
All tests were successful.
DOCUMENTATION:
I am not familiar with the new RST file setup and will need help moving my documentation from comments in the code to the RST files.
CHECKLIST
Possibly; I'm not sure what exactly the documentation requirements are currently.
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@michelleharrold @mkavulich @JeffBeck-NOAA @willmayfield