Parallelize a single (ConvTranspose3d) layer

I have an architecture, built to process large 3D images. During a forward pass (for batch_size=1), GPU memory overshoots at place of the 3D transposed convolution.

self.add_module(‘U_conv_’+str(i+1), nn.ConvTranspose3d(channel[i], channel[i+1], kernel_size, stride=2, padding=padd, output_padding=out_padd, bias=True)

File “/home/ivan/”, line 149, in forward
x = self._modules’U_conv_’+str(i+1)
File “/home/ivan/.local/lib/python2.7/site-packages/torch/nn/modules/”, line 477, in call
result = self.forward(*input, **kwargs)
File “/home/ivan/.local/lib/python2.7/site-packages/torch/nn/modules/”, line 818, in forward
output_padding, self.groups, self.dilation)
RuntimeError: CUDA error: out of memory

Is there a way to parallelize over multiple GPUs only a single layer (due to large size of the image, parallelizing different layers didn’t solve the issue)?

You can manually keep half of your model on gpu 1 and half model in gpu 0. You’ll also correspondingly have to update gpu id of tensors corresponding to part of model it is going into.

Ideally you would like to divide model reoughly sequentially into halfs and set them on multiple gpus, and during forward change data from gpuid 0 to 1 or vice versa depending on how your model is located.

This is called model parallelism and some better explaining could be found here .

Thanks for the response. As I mentioned, due to large size of the image, parallelizing different layers didn’t solve the issue. I have heavy upsampling layer in the model and memory overshoots there, even if I move computations for the layer to a separate gpu.

Ohhh sorry I thought it was going out of memory upto the convTransposed3d layer.
In that case I am not sure if it’s possible