Generating Data in Parallel

Hi Guys,

I am trying to generate data in parallel following this tutorial.

This tutorial first assumes that my dataset should be in this format-
training_generator = SomeSingleCoreGenerator('some_training_set_with_labels.pt')

I have never stored data in this format, mine data is in
Dataset
ClassA
ClassB…

format.

How to do it in the above format, so I can proceed to follow the tutorial in the required format?

I think this example refers to the case where you use the builting torch.utils.data.Dataset and torch.utils.data.Dataloader to build your dataset loader.
The dataloader in particular will give you a generator like this.

1 Like

Okay, I have a doubt. Before following the tutorial, I was doing the data parallelism using the official Pytorch:DATA PARALLELISM -(https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html#optional-data-parallelism) tutorial.
It basically wraps up the model in nn.DataParallel(model).
So nn.DataParallel(model) doesn’t have any effect on loading the input data in parallel and data is still being loaded using a single core only?

The DataParallel module is to run your model in parrallel on multiple GPUs. This is not related to loading data in parallel !

1 Like