You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey,
I am using SAM2 to analyze videos recorded at 30 FPS and currently tracking around 16 objects, achieving a tracking speed only of 2 FPS (propagating every 4 frames). I am interested in understanding the factors impacting this performance, for example the number of frames loaded for the video predictor.
Could you provide insights on optimizations or adjustments that might help improve performance? For example, would loading object IDs directly into tensors enhance processing speed? Any guidance on potential changes to reach my desired FPS would be greatly appreciated.
Dania
The text was updated successfully, but these errors were encountered:
There are a few changes that can be made to speed things up, but they'll generally come at the cost of accuracy. The time required per frame is (roughly) something like:
time per frame = E + M*n
Where:
E is the image encoding time
M is the masking + memory encoding + memory attention time
n is the number of objects being tracked
The image encoding time can be decreased by switching to smaller models (e.g. using the tiny model) as well as running at a lower image resolution (see issue #257). Both of these changes can reduce segmentation quality/accuracy though.
The time required to load the image could also be considered part of this timing and could be reduced by loading images in parallel to running the model itself, though it should be a relatively small part of the total time either way.
The masking/memory time can be decreased by using fewer previous frames in the memory attention step as well as using a lower image resolution. Again, these changes can reduce the quality of the outputs.
Using fewer memory frames requires changes to the code unfortunately, if you wanted to try it a simple hack is to edit line 539 in sam_base.py:
num_prev_frames=1# Values between 0 and 6 are validfort_posinrange(1, 1+num_prev_frames): #self.num_maskmem):
It's very situational, but it some scenes it might also be possible to decrease the number of objects by using a prompt that masks several objects together in a single mask, though you would have to separate the results after-the-fact.
Hey,
I am using SAM2 to analyze videos recorded at 30 FPS and currently tracking around 16 objects, achieving a tracking speed only of 2 FPS (propagating every 4 frames). I am interested in understanding the factors impacting this performance, for example the number of frames loaded for the video predictor.
Could you provide insights on optimizations or adjustments that might help improve performance? For example, would loading object IDs directly into tensors enhance processing speed? Any guidance on potential changes to reach my desired FPS would be greatly appreciated.
Dania
The text was updated successfully, but these errors were encountered: