Achieving Higher FPS with Multiple Object Tracking #367

daniaFrenel · 2024-10-10T08:00:37Z

Hey,
I am using SAM2 to analyze videos recorded at 30 FPS and currently tracking around 16 objects, achieving a tracking speed only of 2 FPS (propagating every 4 frames). I am interested in understanding the factors impacting this performance, for example the number of frames loaded for the video predictor.

Could you provide insights on optimizations or adjustments that might help improve performance? For example, would loading object IDs directly into tensors enhance processing speed? Any guidance on potential changes to reach my desired FPS would be greatly appreciated.

Dania

heyoeyo · 2024-10-11T15:23:23Z

There are a few changes that can be made to speed things up, but they'll generally come at the cost of accuracy. The time required per frame is (roughly) something like:

time per frame = E + M*n

Where:
  E is the image encoding time
  M is the masking + memory encoding + memory attention time
  n is the number of objects being tracked

The image encoding time can be decreased by switching to smaller models (e.g. using the tiny model) as well as running at a lower image resolution (see issue #257). Both of these changes can reduce segmentation quality/accuracy though.
The time required to load the image could also be considered part of this timing and could be reduced by loading images in parallel to running the model itself, though it should be a relatively small part of the total time either way.

The masking/memory time can be decreased by using fewer previous frames in the memory attention step as well as using a lower image resolution. Again, these changes can reduce the quality of the outputs.
Using fewer memory frames requires changes to the code unfortunately, if you wanted to try it a simple hack is to edit line 539 in sam_base.py:

num_prev_frames = 1  # Values between 0 and 6 are valid
for t_pos in range(1, 1 + num_prev_frames): #self.num_maskmem):

It's very situational, but it some scenes it might also be possible to decrease the number of objects by using a prompt that masks several objects together in a single mask, though you would have to separate the results after-the-fact.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Achieving Higher FPS with Multiple Object Tracking #367

Achieving Higher FPS with Multiple Object Tracking #367

daniaFrenel commented Oct 10, 2024

heyoeyo commented Oct 11, 2024

Achieving Higher FPS with Multiple Object Tracking #367

Achieving Higher FPS with Multiple Object Tracking #367

Comments

daniaFrenel commented Oct 10, 2024

heyoeyo commented Oct 11, 2024