About Video loader

oasjd7 · June 18, 2018, 2:38am

Hi all,
For load video data, I need to make clear some concepts.
In action recognition, A frame of video have one label,
and In Object detection, A frame of video have multiple labels (bboxes) ? Is it correct?

ptrblck · June 18, 2018, 8:47am

It depends on the setup. Sometimes in action recognition the video sequence has one label, i.e. multiple frames have are associated with one label.

Usually in object detection there can be multiple boxes in each frame. So yes, that’s correct.