Applying convolution inside dataloader

I am trying to apply convolution/pooling inside data loader as follows:

def __getitem__(self, index):
        target = self.lbls[index]
        # read image and convert to PIL image
        I = openImage(self.fnArr[index]) 
        I = TF.to_pil_image(I, mode='RGB')
        # apply transformations, including totensor()
        I = self.transform(I)
        # here I need to transfer I to GPU so that I can apply pooling 
        I = I.cuda() # I am getting error here
        re = self.pool(I)

But I am getting the following error: " Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the ‘spawn’ start method".
What should I do to overcome this?

DataLoaders spawn multiple processes under the hood to load your data in parallel and like the error says, pytorch doesn’t support multiprocessing + CUDA without using torch.multiprocessing.

Why does the convolution/pooling need to be done in the dataloader? You could just do it as the first step in your main process as soon as you get the data from the dataloader.

I dont use multiprocessing. Inside dataloader I need to load large images of different sizes, preprocess them and concatenate sub images extracted from them. For preprocessing I need to calculate average of windows. For that I need to apply pooling.

Ok, then I think you want to write a custom Dataset class and then use a Dataloader with num_workers=0.

That way the dataloader shouldn’t do any multiprocessing and so you can use your GPU to preprocess in the Dataset class.

1 Like

Great. Thanks a lot.