Real time detection - would using 30 frames as a one batch faster and how?

So I am trying to build a real-time object tracking algorithm.

My current pipeline is for each frame:

  1. Use OpenCV to obtain webcam frame
  2. frame -> pytorch model -> output
  3. applying object track algorithm

But I think doing this would be faster, for each frame:

  1. Use OpenCV to obtain webcam frame
  2. stack 30 frames
  3. frame -> pytorch model -> output
  4. apply object track algorithm

However I am not sure how to batch 30 frames together? Can anyone give me some guidance?

Hello, I suggest something like this,

# store the first frame
>>> frame = torch.rand(3, 500, 500)
>>> batch_frame = frame

# loop over over frame here
batch_frame = torch.stack((batch_frame, new_frame))

If you loop 29 times batch_frame will have a shape of (30, 3, 500, 500).