I have multiple video files (of same length) from which I create NTCHW shaped tensors (alongside metadata, so TensorDataset is out).
It would be quite easy if I can create a dataset (per video file) with a dataloader to create batches in time and then create a dataset over all video files and a top level dataloader to being them together into the desired shape.
But there’s no way to manage the num_workers of the 2 nested dataloaders to spread the work around.
Dataloader(
Dataset(
TemporalDataloader(
VideoDataset(file),
num_workers=ABC,
) for filename in list_of_files,
num_workers=DEF,
)
)
I see there’s a multiprocessing_context
in the API. Is that the correct way to go ahead? What are the limitations of this approach if there’s a better approach?