Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend yt support #3

Open
8 of 11 tasks
hyschive opened this issue Jul 3, 2021 · 2 comments
Open
8 of 11 tasks

Extend yt support #3

hyschive opened this issue Jul 3, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request paper Worthy to be put in a paper pri-medium Priority: medium

Comments

@hyschive
Copy link
Contributor

hyschive commented Jul 3, 2021

Tasks

  • Support the following yt functionalities
    • OffAxisProjectionPlot
    • SlicePlot
    • OffAxisSlicePlot
    • Halo Analysis
    • Isocontours
    • volume_render only if size of MPI is even.
    • ParticlePlot
    • ParticleProjectionPlot
    • LinePlot
  • Distinguish what yt operation should be inside if suite:
    if yt.is_root():
    • I think the core of parallelism is on accessing data, so probably all the other operation that has nothing to do with accessing data should be put inside here. (But this is my guessing, should be further checked.)
    • For volume rendering, saving rendering figure should NOT be inside if yt.is_root(): clause. Ask Non-Local Grid From Other MPI Rank #26
    • For some of the annotations, saving figure should NOT be inside if yt.is_root(). Plot Modifications / Annotations Test #35

Notes

  • Better to work with Matt on this.
  • Some of the above functionalities have not been parallelized with grid decompositions in yt. Which they will get grids that aren't exist on local rank.
  • Halo Analysis and Isocontours have not been tested yet.
  • Enzo embedded python analysis may not support particles?
  • Related issue Inline-analysis shut down when plot with select data #14
@hyschive hyschive added enhancement New feature or request help wanted Extra attention is needed paper Worthy to be put in a paper pri-medium Priority: medium labels Jul 3, 2021
@hyschive hyschive changed the title Support more yt functionalities Extend yt support Jul 3, 2021
@cindytsai cindytsai mentioned this issue Jul 3, 2021
2 tasks
@hyschive hyschive removed the help wanted Extra attention is needed label Jul 7, 2021
@cindytsai
Copy link
Collaborator

cindytsai commented Aug 4, 2021

Hi @matthewturk ,
I've some new thoughts on supporting yt inline-analysis, while I was writing milestone and have a more in depth view in it. Sorry about the false statement about yt functionalities must use parallel_objects to be able to do inline-analysis. I want to make it more clear here.

Counterexample

I'm using ProjectionPlot as an example. This one is already parallelized and works just fine in libyt/example with the inline script:

import yt
yt.enable_parallelism()
def yt_inline_ProjectionPlot( fields ):
    ds = yt.frontends.libyt.libytDataset()
    prjz = yt.ProjectionPlot(ds, 'z', fields)
    if yt.is_root():
        prjz.save()

Code snippet of ProjectionPlot.

# yt/data_objects/construction_data_containers.py
class YTProj(YTSelectionContainer2D):
    def get_data(self, fields=None):
        ...
        with self.data_source._field_parameter_state(self.field_parameters):
            for chunk in parallel_objects(self.data_source.chunks([], "io", local_only=True)):
                if not _units_initialized:
                    self._initialize_projected_units(fields, chunk)
                    _units_initialized = True
                self._handle_chunk(chunk, fields, tree)
        ...

The iterable objects pass to parallel_objects is set to local_only=True, which gets grids located on this rank only. So even if we don't use parallel_objects here, the final output will still be the same.

The following two changes in the for loop will give the same results.

  • Use parallel_objects to distribute jobs
    for chunk in parallel_objects(self.data_source.chunks([], "io", local_only=False)):
    • Since parallel_objects distribute jobs to each MPI rank in ascending order, and self.data_source.chunks also yields data chunks in ascending order of MPI rank, all ranks end up only have to deal with local grids.
  • Not using parallel_objects
    for chunk in self.data_source.chunks([], "io", local_only=True):
    • Neither do we need to collecting results in each rank, nor do we need to distribute jobs.
    • We directly use data chunk in each MPI rank as iterable.
    • We aren't using parallel_objects at all!

Conclusion

  • We will need parallel_objects when we need to collect the results from each rank, after handling their local data respectively.
    • For example, find_max collect maximum in each rank.
      # yt/data_objects/derived_quantities.py
      class DerivedQuantity(ParallelAnalysisInterface):
          def __call__(self, *args, **kwargs):
              ...
              chunks = self.data_source.chunks([], chunking_style="io")
              storage = {}
              for sto, ds in parallel_objects(chunks, -1, storage=storage):
                  sto.result = self.process_chunk(ds, *args, **kwargs)
              ...
  • We need parallel_objects to distribute jobs that contain local grids and non-local grids. Like example above.
  • As long as yt only handles local grids only, it's OK not to have parallel_objects. Unless we need to collect data.

P.S.
I don't really know how yt works in parallel, but I hope the information will help. It's a little bit confusing for me at the first thoughts of not using parallel_objects in parallelized operations, because they must communicate to each other in some way. I guess they're hidden somewhere in ParallelAnalysisInterface then. Again, sorry for the false statement I said in the last meeting.

@cindytsai
Copy link
Collaborator

Please do take your time.
There are some interesting things, after comparing SlicePlot and ProjectionPlot. The former one failed to run inline-analysis in parallel (MPI rank > 1), but the latter succeeded.

They both are calling class YTSelectionContainer method get_data.

# yt/data_objects/selection_objects/data_selection_objects.py
class YTSelectionContainer(YTDataContainer, ParallelAnalysisInterface):
    def get_data(self, fields=None):
        ...
        # The _read method will figure out which fields it needs to get from
        # disk, and return a dict of those fields along with the fields that
        # need to be generated.
        read_fluids, gen_fluids = self.index._read_fluid_fields(
            fluids, self, self._current_chunk
        )
        ...
# yt/geometry/geometry_handler.py
class Index(ParallelAnalysisInterface, abc.ABC):
    def _read_fluid_fields(self, fields, dobj, chunk=None):
        ...
        fields_to_return = self.io._read_fluid_selection(
            self._chunk_io(dobj), selector, fields_to_read, chunk_size
        )
        ...
  • ProjectionPlot
    For the case in ProjectionPlot, dobj passed in to self._chunk_io only contains local grids that need to be read. So no error occurred while running in parallel.
  • SlicePlot
    However, in SlicePlot, dobj contains some needed grids' id, that doesn't even exist on its rank. Which will of course leads to error in the libyt frontend, because grid_data does not have key g.id.
# yt/frontends/libyt/io.py
class IOHandlerlibyt(BaseIOHandler):
    def _read_fluid_selection(self, chunks, selector, fields, size):
        ...
        for field in fields:
            offset = 0
            ftype, fname = field
            for chunk in chunks:
                for g in chunk.objs:
                    if field_list[fname]["field_define_type"] == "cell-centered":
                        data_convert = self.grid_data[g.id][fname][:, :, :]
        ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request paper Worthy to be put in a paper pri-medium Priority: medium
Projects
None yet
Development

No branches or pull requests

3 participants