Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic slicing and batch size #5671

Open
1 task done
rems75 opened this issue Oct 9, 2024 · 7 comments
Open
1 task done

Dynamic slicing and batch size #5671

rems75 opened this issue Oct 9, 2024 · 7 comments
Assignees
Labels
question Further information is requested Video Video related feature/question

Comments

@rems75
Copy link

rems75 commented Oct 9, 2024

Describe the question.

Hello everyone,

I'm trying to optimise a torch data loading pipeline that involves video decoding and thought I'd give DALI a try (already tried things like pynvvideocodec but that ended up quite slow). I have something more or less working but at the cost of some suboptimal decisions so I'm wondering whether I missed relevant options or whether DALI is not perfectly suited for my use case.

I have a set of N 1s videos, where N changes from batch to batch, and I want to extract a certain number of frames from those videos, where the indices of the frames differ from video to video. From reading other posts, it does seem at the frontier of what DALI was designed for.

I have set up an ExternalInputCallable class with batch=False (in order to leverage parallelism) where __call__ returns a video and list of indices, and a pipeline based on fn.experimental.decoders.video.

The questions I have are the following:

  • how can I handle the dynamic number of videos per batch? I tried setting up a pipeline batch size larger than the max number of videos and a StopIteration in the external source but it doesn't seem to work. I could reinitialise the pipeline with the appropriate batch size every time, but it seems wasteful.
  • is there a way to compose fn.element_extract and decoding at the sample level in the pipeline? Right now I'm doing the slicing per sample on a torch tensor built from each tensorGPU returned by pipeline.run(), which feels very inefficient.
  • or maybe a different setup is more appropriate?

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report
@rems75 rems75 added the question Further information is requested label Oct 9, 2024
@mzient
Copy link
Contributor

mzient commented Oct 9, 2024

Hello @rems75
I don't think you can use a parallel external source for your use case - please try ordinary external_source in batch mode. This way the batch size for the iteration is determined by whatever these external_sources produce.

@rems75
Copy link
Author

rems75 commented Oct 10, 2024

Thanks for the answer @mzient . I'll be training things on H100s, which have 7 nvdecs, will the ordinary external source in batch mode be able to leverage all of them? (In my case the batch will contain 20-30 1s videos)

Regarding the second question, any thoughts on extracting frames in the pipeline itself? Maybe a custom operator?

@JanuszL
Copy link
Contributor

JanuszL commented Oct 11, 2024

Hi @rems75,

Thank you for reaching out. Now, only experimental.decoders.video attempts to decode multiple files in parallel, while other video readers/decoders do it sequentially, using only one nvdec at a time.

@rems75
Copy link
Author

rems75 commented Oct 11, 2024

Hi @JanuszL
Good to know, that's the one I've been using. Any recommendation on profiling tools to check how things are going?

@JanuszL
Copy link
Contributor

JanuszL commented Oct 11, 2024

Hi @rems75,

You can try using NSight System and explore its video profiling capabilities.

@rems75
Copy link
Author

rems75 commented Oct 18, 2024

Thanks for the pointer @JanuszL, still looking into it.
In the mean time, any thoughts on doing indexing directly in the decoder to save memory space (and presumably speed things up)? Did I miss an option?

@JanuszL
Copy link
Contributor

JanuszL commented Oct 18, 2024

In the mean time, any thoughts on doing indexing directly in the decoder to save memory space (and presumably speed things up)? Did I miss an option?

Let me add this to our ToDo list.

@JanuszL JanuszL added the Video Video related feature/question label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Video Video related feature/question
Projects
Status: ToDo
Development

No branches or pull requests

4 participants