Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow FieldTimeSeries to pass keyword arguments to jldopen #3739

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

ali-ramadhan
Copy link
Member

This PR takes some code I had to pass kwargs to jldopen through FieldTimeSeries.

But from the dicussion in issue #3680 it sounds like maybe a more future-proof way to achieve this is to add the kwargs object to the backend object (either InMemory or OnDisk)?

I guess there's a bit of a conflict of wording here? Backend refers to how the data is stored for a FieldTimeSeries. So we could end up with OnDisk{NetCDF} and OnDisk{JLD2} for example. And maybe also for PartlyInMemory. But for fully InMemory there's no "backend"?

Resolves #3680

@glwagner
Copy link
Member

Right InMemory is more like a nothing backend, there's nothing back there.

On ClimaOcean there's an example how we extended this infrastructure to a particular kind of NetCDF backend: https://github.com/CliMA/ClimaOcean.jl/blob/dd9148a8f702699ddf2947d3a1a094adddbd9bb1/src/DataWrangling/JRA55.jl#L218

The implementation in this PR seems simple enough though. @ali-ramadhan maybe can you provide a little syntax example how you expect this to be used? @navidcy take a look and see what you think

@glwagner glwagner requested a review from navidcy August 28, 2024 03:47
@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Aug 28, 2024

I've used it to open JLD2 files in parallel using multiple threads with @threads so that each thread can process one time snapshot. Here's an example:

u = FieldTimeSeries("fields.jld2", "u";
    backend = OnDisk(),
    backend_kw = Dict(:parallel_read => true)
)

Nt = length(u)
temporal_reduction = zeros(Nt)

Threads.@threads for n in 1:Nt
    temporal_reduction[n] = median(u[n])
end

Of course, this post-processing is much slower than just doing it on-the-fly as the simulation runs. But it's still useful to be able to do this.

One especially useful use case is parallel plotting from a FieldTimeSeries.

@glwagner
Copy link
Member

Nice!

Only API suggestion is that the extra kw could be directly passed to FieldTimeSeries, so

u = FieldTimeSeries("fields.jld2", "u"; backend = OnDisk(), parallel_read=true)

Under the hood this can work with something like

function FieldTimeSeries(path::String, name::String; backend=InMemory(), kw...)
    # etc
    backend_kw = Dict(kw)
    # etc
end

You're the primary user though so you can state your preference!

@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Aug 30, 2024

I agree that kw... results in a cleaner API with no passing of dicts.

I'm just wondering if there will be a use case in the future that will require passing a different set of kwargs. Then we'll need kw1, kw2 as kw... cannot disambiguate between kw1 and kw2.

@ali-ramadhan
Copy link
Member Author

I thought about this a bit more and further think we should stick to keeping kw a dict so that the API is more consistent with how the JLD2OutputWriter constructor works as it also expects a dict.

@glwagner
Copy link
Member

I thought about this a bit more and further think we should stick to keeping kw a dict so that the API is more consistent with how the JLD2OutputWriter constructor works as it also expects a dict.

We could also change JLD2OutputWriter!

@ali-ramadhan
Copy link
Member Author

Definitely agree the two interfaces should be consistent. I'd be okay with either, although still slightly favor the current interface in case we will want to pass two sets of kwargs in the future.

Would you support merging this PR and we can open an issue to discuss the best interface?

@glwagner
Copy link
Member

Yes for sure support that

@glwagner
Copy link
Member

Was hoping @navidcy might review but I can give it one

return FieldTimeSeries{LX, LY, LZ}(data, grid, backend, boundary_conditions,
indices, times, path, name, time_indexing)
return FieldTimeSeries{LX, LY, LZ}(data, grid, backend, boundary_conditions, indices,
times, path, name, time_indexing, backend_kw)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think theere needs to be changes to adapt_structure which I do not see in this PR. Possibly this is not tested right now. @simone-silvestri

Copy link
Member Author

@ali-ramadhan ali-ramadhan Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Ah true. I guess originally we didn't envision passing FieldTimeSeries into GPU kernels and I don't see a test that does this.

But with features like FieldTimeSeries forcing (PR #3760) an adapt_structure will be necessary. Actually not sure how tests will pass in that PR without adapting.

I can add some tests here though. There are no tests that use backend_kw...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think there are no tests about this. The adapt for a FieldTimeSeries returns a different type: GPUAdaptedFieldTimeSeries that is a little slimmer in terms of parameter space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok but this PR breaks FieldTimeSeries on GPU which will break everything in ClimaOcean. @simone-silvestri can you add some tests here?

@ali-ramadhan
Copy link
Member Author

ali-ramadhan commented Oct 7, 2024

Thanks @simone-silvestri for adding a GPU test for FieldTimeSeries! I've adapted it correctly now and tests pass locally so I think this PR is ready for review.

Actually I should add a test that uses the new kwargs. Would do a test where multiple threads open the same FieldTimeSeries but don't think we have multi-threaded tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support passing keyword arguments to jldopen when using FieldTimeSeries
3 participants