Custom conv2d layer which performs convolutions on select input channels failing to work

Hello!

I implemented a conv2d layer which would take selected channels of the input using a binary mask and convolve them with a weight filter. Below is the code snippet for the same. Briefly explained, first part to the input to conv2d converts the image into an image with selected channels, and second argument just takes the weights. The assumption is that the number of ones in the binary mask is the same as the weight filter channels:

class execute2DConvolution(torch.nn.Module):
def __init__(self, mask, inStride=1, inPadding=0, inDilation=1, inGroups=1):
    super(execute2DConvolution, self).__init__()
    self.cStride = inStride
    self.cPad = inPadding
    self.cDil = inDilation
    self.cGrp = inGroups
    self.mask = mask
    #print(self.mask.size())

def forward(self, dataIn, weightIn):
     out = [torch.nn.functional.conv2d(torch.masked_select(dataIn,torch.unsqueeze(self.mask[i],0).repeat(dataIn.size()[0],1,1,1)).view(dataIn.size(0),weightIn.size(1),dataIn.size(2),dataIn.size(3)), torch.unsqueeze(weightIn[i],0), bias=None, stride=self.cStride, padding=self.cPad,dilation=self.cDil, groups=self.cGrp) for i in range(weightIn.size(0))]
     return torch.cat(out,1)

However, on implementing this, I am getting a out of memory error. How do I go about this? Also, would this method be too slow? I am probably looping for outputdim, which can be upto 1000 or more. Is there a better way for doing this? Any help appreciated. Thanks! Here is a more detailed view of the memory error.

out = [torch.nn.functional.conv2d(torch.masked_select(dataIn,torch.unsqueeze(self.mask[i],0).repeat(dataIn.size()[0],1,1,1)).view(dataIn.size(0),weightIn.size(1),dataIn.size(2),dataIn.size(3)), torch.unsqueeze(weightIn[i],0), bias=None, stride=self.cStride, padding=self.cPad,dilation=self.cDil, groups=self.cGrp) for i in range(weightIn.size(0))]
File “/home/ameya/anaconda3/lib/python3.5/site-packages/torch/autograd/variable.py”, line 719, in masked_select
return MaskedSelect.apply(self, mask)
File “/home/ameya/anaconda3/lib/python3.5/site-packages/torch/autograd/_functions/tensor.py”, line 468, in forward
return tensor.masked_select(mask)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1503968623488/work/torch/lib/THC/generic/THCStorage.cu:66