Parallelize four conv2d layers on GPU

ElToto · December 21, 2020, 7:03pm

I have an input tensor with size [1,3,4,100,100] which corresponds to [batchsize, channels, depth, width, height].
I want to use a 2d convolution for each depth so I need four 2d convolutions. After doing this,
I stack the results again into the depth dimension.
Code:

 x = torch.ones([1,3,4,100,100], dtype=torch.float32).cuda()
c_1 = nn.Conv2d( in_channels=3, out_channels=100, kernel_size=[3,3],padding=1).cuda() # all conv layers have the same parameters
c_2 = nn.Conv2d(3,100,3,padding=1).cuda()
c_3 = nn.Conv2d(3,100,3,padding=1).cuda()
c_4 = nn.Conv2d(3,100,3,padding=1).cuda()
#### Can i do this in parallel
pred_1 = c_1(x[:,:,0,:])
pred_2 = c_2(x[:,:,1,:])
pred_3 = c_3(x[:,:,2,:])
pred_4 = c_3(x[:,:,3,:])
###
pred = torch.stack([pred_1,pred_2,pred_3,pred_4], dim=2)
# pred has size [1,100,4,100,100]

This calculates the convolutions after each other.
But each convolution is completely independent of each other. They all take seperate inputs.
Is it possible to calculate all convolutions at the same time on the gpu?

InnovArul · December 21, 2020, 9:55pm

It’s a bit hacky. You could parallelize Conv2d with groups parameter, given that all the convolution layers have same number of input, output channels, kernel_sizes etc.

A small example:

gist.github.com

https://gist.github.com/InnovArul/bd8b7ce5ac9e615ec251470293563eb2

conv2d_groups_trick.py

import torch, torch.nn as nn, torch.nn.functional as F

def perform_non_parallelconv(input, convs):
    outs = []
    for i in range(len(convs)):
        o = convs[i](input[:, i])
        outs.append(o)

    outs = torch.cat(outs, dim=1)
    return outs

This file has been truncated. show original