Multiprocess cv2 data loader

To start off, I’m not sure if this is a Windows only issue or not, since many objects aren’t pickable under Windows. If anyone knows for certain, please let me know. I could get access to a Linux machine as well.

None the less, I’m trying to piece together a dataloader for a large set of very long videos. I need to sample random frames from these videos so a sequential decoding (which would be the fastest and wouldn’t need multiprocessing) is out of the question. I’ve been profiling different approaches and found that cv2 (~3 seconds for random 100 indices) > pyav (~10 seconds) > ffmpeg (~20 seconds, using subprocesses) is the fastest. Unfortunately, running the dataloader with cv2 and multiple workers is not possible as cv2.VideoCapture objects aren’t pickleable. Creating the capture object in the __iter__ is slower then the ffmpeg method though.

In short this is what the Dataset looks like:

class VideoDataset(Dataset):

    def __init__(self, vid_paths):
        self.caps = [cv2.VideoCapture(vid_path) for vid_path in vid_paths]

    def grab_idcs(self, idx):
        ...

    def __iter__(self, idx):
        vid_idx, frame_idx = self.grab_idcs(idx)
        cap = self.caps[vid_idx]
        cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
        frame = cap.read()
        return frame

The solution would be to be able to create the list of capture objects within each worker, but I doubt there is an option to do that. Does anyone have an idea of how to avoid the pickling error?

I found a solution, which is still not quite fast enough though. Just save the vid paths in init, and initialize the capture objects in the first call of iter:

class VideoDataset(Dataset):

    def __init__(self, vid_paths):
        self.vid_paths = vid_paths
        self.caps = None

    def grab_idcs(self, idx):
        ...

    def __iter__(self, idx):
        vid_idx, frame_idx = self.grab_idcs(idx)
        if self.caps is None:
            self.caps = [cv2.VideoCapture(vid_path) for vid_path in self.vid_paths]
        cap = self.caps[vid_idx]
        cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
        frame = cap.read()
        return frame
1 Like