You should be able to pass this dataset to a DataLoader and set the number of workers there, which would use multiple processes to create the batches or are you seeing any errors with it?
Hi, thanks for reply. No, no errors. I understand what you mean. The problem for me is at this step for generating the torchvision.datasets.Kinetics400 instance (before feed it to the dataloader), it took too long and exceed the time limit for my computing cluster. But I just realized I can add num_workers, like
Yes, it seems you are right and don’t need to wrap this Dataset into a DataLoader.
While the num_workers argument is shown in the docs, it’s unfortunately not explained in the parameters.
Inside the Kinetics400 dataset, a VideoClips object will be created, which accepts the num_workers argument as seen here.
Internally a DataLoader is created using the num_workers argument as seen here.
So it seems this “Dataset” reverts the logic of passing a Dataset instance into a DataLoader and uses a DataLoader instead internally.
I’m not sure, if this approach is used for all video datasets.