Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad coupler diagnostics #44

Open
aekiss opened this issue Jul 6, 2023 · 18 comments
Open

Bad coupler diagnostics #44

aekiss opened this issue Jul 6, 2023 · 18 comments
Labels
bug Something isn't working mediator Related to the CMEPS mediator priority:low
Milestone

Comments

@aekiss
Copy link
Contributor

aekiss commented Jul 6, 2023

There seems to be a 32/64 bit mixup in some coupler diagnostics - see #40 (comment)

@aekiss aekiss added the bug Something isn't working label Jul 6, 2023
@MartinDix
Copy link

MartinDix commented Jul 6, 2023

I tried setting histaux_atm2med_file1_enabled = .true. in the UM test , but this causes a segfault.

Segfault was user error.

Diagnostic coupler fields here have the same 32/64 bit issue as OM3

@aekiss
Copy link
Contributor Author

aekiss commented Jul 7, 2023

Not sure if this helps, but is histaux_atm2med_file1_history_option compatible with the run length?

@aekiss
Copy link
Contributor Author

aekiss commented Jul 7, 2023

FYI

  • do grep hist archive/output000/log/med.log to see when data was written
  • histaux_*_flds is a colon-delimited list of fields to output, or set it to all to output everything

@aekiss
Copy link
Contributor Author

aekiss commented Jul 7, 2023

Using this commit I've output instantaneous snapshots of all fields every timestep for a 2-step run:

/scratch/v45/aek156/access-om3/archive/MOM6-CICE6_ACCESS-OM3-a8771ad/output000/GMOM_JRA.cpl.hx.atm.1step.inst.0001-01-01-03600.nc

This includes state fields AtmImp_Sa_* as well as fluxes AtmImp_Faxa_* (see CMEPS field naming convention). States and fluxes are both affected by the same bug.

@aekiss
Copy link
Contributor Author

aekiss commented Jul 7, 2023

The input files /g/data/ik11/inputs/cime/inputdata/2023-03-10/ocn/jra55/v1.3_noleap/*.nc all contain single-precision data:

for f in /g/data/ik11/inputs/cime/inputdata/2023-03-10/ocn/jra55/v1.3_noleap/*.1958.*.nc; do
   ncdump -h $f | grep "(time, latitude, longitude)"
done

yields

	float lwdn(time, latitude, longitude) ;
	float prec(time, latitude, longitude) ;
	float q_10(time, latitude, longitude) ;
	float slp(time, latitude, longitude) ;
	float swdn(time, latitude, longitude) ;
	float t_10(time, latitude, longitude) ;
	float u_10(time, latitude, longitude) ;
	float v_10(time, latitude, longitude) ;

Not sure where the conversion from single to double precision is occurring. Could be in DATM or CMEPS I suppose.

@aekiss
Copy link
Contributor Author

aekiss commented Jul 7, 2023

DATM's exported fields are all double precision

@MartinDix
Copy link

The diagnostic fields are written by routine med_io_write_FB in med_io_mod.F90. This has an optional argument use_float which is set true in the call to med_io_write from med_phases_history_write_comp_aux. Setting this false makes the variables in the diagnostic file doubles and all is ok.

Perhaps the PIO write isn't as flexible with mixed types as plain netCDF?

@aekiss
Copy link
Contributor Author

aekiss commented Jul 7, 2023

Aha! Thanks @MartinDix, I was also looking through that bit of code but you nailed it first.

For the record, here's the offending line:
https://github.com/ESCOMP/CMEPS/blob/606eb397d4e66f8fa3417e7e8fd2b2b4b3c222b4/mediator/med_phases_history_mod.F90#L1291

@aekiss
Copy link
Contributor Author

aekiss commented Jul 7, 2023

So is the underlying bug in med_io_write_FB? When luse_float is true this does type conversion for attributes but apparently not the data itself - does it assume pio_write_darray handles that?

@MartinDix
Copy link

If PIO passes things straight to netCDF the conversion should work. Need to have a look at the PIO tests.

@micaeljtoliveira
Copy link
Contributor

We need to check if this issue is still present in the latest version of the code.

@anton-seaice
Copy link
Contributor

It may be the pio_initdecomp:

https://github.com/ESCOMP/CMEPS/blob/7e0908cb958fc36002225efe00a3181f24c41c7a/mediator/med_io_mod.F90#L1026

or the lfillvalue,
https://github.com/ESCOMP/CMEPS/blob/7e0908cb958fc36002225efe00a3181f24c41c7a/mediator/med_io_mod.F90#L1061

which are both hardcoded double / real8 and don't adjust for use_float.

I suggest we just put in a patch for use_float=.false. and not worry any more than that?

@aekiss
Copy link
Contributor Author

aekiss commented Feb 16, 2024

#100 should fix this issue for us by using doubles, which is probably what we want to use anyway to properly check coupling is correct.

However it uses a patch, so it would be better to have this fixed upstream for when we update our CMEPS version.

I think we should post an issue on the CMEPS repo if this bug is still present on the CMEPS main branch. It still uses use_float=.true. but I haven't checked whether float support is still buggy in med_io_write.

@anton-seaice
Copy link
Contributor

I went for a patch because the real fix requires more investigation and changes - (assuming support for singles is needed). That probably doesn't preclude us raising an issue though.

Ok - ill do a build without the patch and using CMEPS main branch and see what happens.

@aekiss
Copy link
Contributor Author

aekiss commented Feb 16, 2024

thanks @anton-seaice

@anton-seaice
Copy link
Contributor

I built with CMEPS main and confirmed the error still exists:

Screenshot 2024-02-19 at 10 20 43 am

And raised with ESCOMP:
ESCOMP/CMEPS#430

Shall we close this issue? Or park it to see if there is a CMEPS fix?

@aekiss
Copy link
Contributor Author

aekiss commented Feb 19, 2024

Thanks @anton-seaice let's use your patch for now but leave the issue open until we adopt a fix from upstream, at which point we can remove your patch and close the issue.

@anton-seaice anton-seaice added the blocked For issues waiting resolution of issues outside this repository label Feb 19, 2024
@anton-seaice
Copy link
Contributor

The fix for this has gone in, see ESCOMP/CMEPS#488

we can remove the patch and test next time we update the components

@anton-seaice anton-seaice added 0.4.0 and removed blocked For issues waiting resolution of issues outside this repository labels Aug 1, 2024
@anton-seaice anton-seaice added this to the 0.4.0 milestone Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mediator Related to the CMEPS mediator priority:low
Projects
None yet
Development

No branches or pull requests

5 participants