Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggested refactoring to avoid OOM errors #313

Open
hvgazula opened this issue Apr 1, 2024 · 1 comment
Open

suggested refactoring to avoid OOM errors #313

hvgazula opened this issue Apr 1, 2024 · 1 comment
Assignees
Milestone

Comments

@hvgazula
Copy link
Contributor

hvgazula commented Apr 1, 2024

def scalar_labels(self):
return _labels_all_scalar([y for _, y in self.dataset.as_numpy_iterator()])

Suggestion:

def scalar_labels(self):
        temp_ds = self.dataset.map(
            lambda _, y: tf.experimental.numpy.isscalar(y),
            deterministic=False,
            num_parallel_calls=AUTOTUNE,
        )
        return tf.math.reduce_all(list(temp_ds.as_numpy_iterator())).numpy()

Notes:

  1. The previous snippet collects all label volumes into a list (this is a memory hog and hence the reason for OOM) and then applies _labels_all_scalar.
  2. Refactored the snippet to map each label volume into a isscalar function which returns a bool flag. Subsequently, the collected bool flags are reduced to one final bool flag.
  3. The other (naive) approach is to run nobrainer.tfrecord._is_int_or_float() on each element of the dataset (in a for loop) and then reduce all the bool flags (similar to step 2).
  4. I am unsure why the GPU utilization is non-zero during this operation.
  5. I still maintain that the repeat should be delayed until after this operation. Otherwise, the entire repeated dataset will be used for this operation and is undesirable.

Caveat:

  1. Used a tf.experimental function which may (or may not) be deprecated in the future. @satra what are your thoughts on using experimental features in the nobrainer API?
@hvgazula
Copy link
Contributor Author

hvgazula commented Apr 1, 2024

this suggestion works on a small dataset and doesn't fail. however, it is still time-consuming. The non-zero GPU util may have to do with this (🤷‍♂️ ) as there will be an overhead in going back and forth between the CPU and GPU.

hvgazula added a commit that referenced this issue Apr 1, 2024
also disable calculating scalar_labels, for more info #313
hvgazula added a commit that referenced this issue Apr 2, 2024
also disable calculating scalar_labels, for more info #313
@hvgazula hvgazula added this to the 1.2.1 milestone Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant