Extend yt support #3

hyschive · 2021-07-03T08:25:14Z

cindytsai · 2021-08-04T08:21:44Z

Hi @matthewturk ,
I've some new thoughts on supporting yt inline-analysis, while I was writing milestone and have a more in depth view in it. Sorry about the false statement about yt functionalities must use parallel_objects to be able to do inline-analysis. I want to make it more clear here.

Counterexample

I'm using ProjectionPlot as an example. This one is already parallelized and works just fine in libyt/example with the inline script:

import yt
yt.enable_parallelism()
def yt_inline_ProjectionPlot( fields ):
    ds = yt.frontends.libyt.libytDataset()
    prjz = yt.ProjectionPlot(ds, 'z', fields)
    if yt.is_root():
        prjz.save()

Code snippet of ProjectionPlot.

# yt/data_objects/construction_data_containers.py
class YTProj(YTSelectionContainer2D):
    def get_data(self, fields=None):
        ...
        with self.data_source._field_parameter_state(self.field_parameters):
            for chunk in parallel_objects(self.data_source.chunks([], "io", local_only=True)):
                if not _units_initialized:
                    self._initialize_projected_units(fields, chunk)
                    _units_initialized = True
                self._handle_chunk(chunk, fields, tree)
        ...

The iterable objects pass to parallel_objects is set to local_only=True, which gets grids located on this rank only. So even if we don't use parallel_objects here, the final output will still be the same.

The following two changes in the for loop will give the same results.

Use parallel_objects to distribute jobs
```
for chunk in parallel_objects(self.data_source.chunks([], "io", local_only=False)):
```
- Since parallel_objects distribute jobs to each MPI rank in ascending order, and self.data_source.chunks also yields data chunks in ascending order of MPI rank, all ranks end up only have to deal with local grids.
Not using parallel_objects
```
for chunk in self.data_source.chunks([], "io", local_only=True):
```
- Neither do we need to collecting results in each rank, nor do we need to distribute jobs.
- We directly use data chunk in each MPI rank as iterable.
- We aren't using parallel_objects at all!

Conclusion

We will need parallel_objects when we need to collect the results from each rank, after handling their local data respectively.

For example, find_max collect maximum in each rank.

# yt/data_objects/derived_quantities.py
class DerivedQuantity(ParallelAnalysisInterface):
    def __call__(self, *args, **kwargs):
        ...
        chunks = self.data_source.chunks([], chunking_style="io")
        storage = {}
        for sto, ds in parallel_objects(chunks, -1, storage=storage):
            sto.result = self.process_chunk(ds, *args, **kwargs)
        ...

We need parallel_objects to distribute jobs that contain local grids and non-local grids. Like example above.
As long as yt only handles local grids only, it's OK not to have parallel_objects. Unless we need to collect data.

P.S.
I don't really know how yt works in parallel, but I hope the information will help. It's a little bit confusing for me at the first thoughts of not using parallel_objects in parallelized operations, because they must communicate to each other in some way. I guess they're hidden somewhere in ParallelAnalysisInterface then. Again, sorry for the false statement I said in the last meeting.

cindytsai · 2021-08-05T17:08:27Z

Please do take your time.
There are some interesting things, after comparing SlicePlot and ProjectionPlot. The former one failed to run inline-analysis in parallel (MPI rank > 1), but the latter succeeded.

They both are calling class YTSelectionContainer method get_data.

# yt/data_objects/selection_objects/data_selection_objects.py
class YTSelectionContainer(YTDataContainer, ParallelAnalysisInterface):
    def get_data(self, fields=None):
        ...
        # The _read method will figure out which fields it needs to get from
        # disk, and return a dict of those fields along with the fields that
        # need to be generated.
        read_fluids, gen_fluids = self.index._read_fluid_fields(
            fluids, self, self._current_chunk
        )
        ...

# yt/geometry/geometry_handler.py
class Index(ParallelAnalysisInterface, abc.ABC):
    def _read_fluid_fields(self, fields, dobj, chunk=None):
        ...
        fields_to_return = self.io._read_fluid_selection(
            self._chunk_io(dobj), selector, fields_to_read, chunk_size
        )
        ...

ProjectionPlot
For the case in ProjectionPlot, dobj passed in to self._chunk_io only contains local grids that need to be read. So no error occurred while running in parallel.
SlicePlot
However, in SlicePlot, dobj contains some needed grids' id, that doesn't even exist on its rank. Which will of course leads to error in the libyt frontend, because grid_data does not have key g.id.

# yt/frontends/libyt/io.py
class IOHandlerlibyt(BaseIOHandler):
    def _read_fluid_selection(self, chunks, selector, fields, size):
        ...
        for field in fields:
            offset = 0
            ftype, fname = field
            for chunk in chunks:
                for g in chunk.objs:
                    if field_list[fname]["field_define_type"] == "cell-centered":
                        data_convert = self.grid_data[g.id][fname][:, :, :]
        ...

hyschive added enhancement New feature or request help wanted Extra attention is needed paper Worthy to be put in a paper pri-medium Priority: medium labels Jul 3, 2021

hyschive assigned cindytsai and hyschive Jul 3, 2021

hyschive changed the title ~~Support more yt functionalities~~ Extend yt support Jul 3, 2021

cindytsai mentioned this issue Jul 3, 2021

Miscellaneous issues #12

Closed

2 tasks

hyschive assigned matthewturk Jul 7, 2021

hyschive removed the help wanted Extra attention is needed label Jul 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend yt support #3

Extend yt support #3

hyschive commented Jul 3, 2021 •

edited by cindytsai

Loading

cindytsai commented Aug 4, 2021 •

edited

Loading

cindytsai commented Aug 5, 2021

Extend yt support #3

Extend yt support #3

Comments

hyschive commented Jul 3, 2021 • edited by cindytsai Loading

Tasks

Notes

cindytsai commented Aug 4, 2021 • edited Loading

Counterexample

Conclusion

cindytsai commented Aug 5, 2021

hyschive commented Jul 3, 2021 •

edited by cindytsai

Loading

cindytsai commented Aug 4, 2021 •

edited

Loading