Masking out locations in convolutional kernels

mjdmahsneh · May 24, 2021, 9:08pm

Hello Pytorchers!

I am trying to implement a 3D convolutional layer where kernels have some sampling locations completely masked out. Particularly, I want to pass a binary mask, such that locations that are set to zero do NOT contribute to the learning process.

In the example below, I am using a cross-shaped mask, then multiplying it by the convolutional kernel weights, to eliminate responses from zeroed positions (where mask == 0).

Now my question comes in two parts:

Am I achieving this correctly? Mind that my aim is to keep using this constant mask during all forward/backward passes, with no updates on the mask weights (so that at any point during training, only locations of interest are being learned).
Is there a way to also customize the bias term, such that it is only added to locations of interest?

class MaskedConv3d(nn.Module):
        def __init__(self, n_in, n_out, filter_mask, pad = 1):
            super().__init__()
            self.kernel_size = tuple(filter_mask.shape)
            self.register_buffer('filter_mask', filter_mask)
            self.conv = nn.Conv3d(
                in_channels=n_in,
                out_channels=n_out,
                kernel_size=self.kernel_size,
                stride=1,
                bias=False,
                padding = pad
            )
        def forward(self, x):
            self._mask_conv_filter()
            return self.conv(x)
        def _mask_conv_filter(self):
            with torch.no_grad():
                self.conv.weight = nn.Parameter(self.conv.weight * self.filter_mask)

# define mask:
mask = torch.tensor([[[0., 1., 0.],
                      [1., 1., 1.],
                      [0., 1., 0.]],

                     [[0., 1., 0.],
                      [1., 1., 1.],
                      [0., 1., 0.]],

                     [[0., 1., 0.],
                      [1., 1., 1.],
                      [0., 1., 0.]]])
   
x = torch.randn(1, 1, 6, 6, 6)

maskedconv3d = MaskedConv3d(1, 8, mask)
out = maskedconv3d(x)

eqy · May 24, 2021, 11:40pm

I don’t see any obvious problems here but you can do some simple tests like running your layer on a ones tensor input and checking that the results are what you expect based on the mask. If you are using batchnorm layers after the convolution, you can avoid the bias term entirely as it will be effectively undone by the batchnorm. Additionally, I don’t think the bias is applied before the convolution, so it shouldn’t be affected by (or affect) the mask that you are using.

mjdmahsneh · May 25, 2021, 12:05am

@eqy Thank you for your response. As you suggested, I have run a few tests to make sure am getting the expected output, it seems to be working as expected!

However, the fact that the bias is added after the convolution process, means that positions where mask(convolution(x))==0, will evaluate to 0+bias (after adding the bias term). So for now, I have set bias=False but it would be nice to also customize the bias term along with the masking process!

eqy · May 25, 2021, 12:12am

I’m not sure I understand what you mean by positions where mask(convolution(x))==0, as in the current code looks like the mask is independent of position (the same cross pattern is applied everywhere).

mjdmahsneh · May 25, 2021, 12:16am

Sorry, it wasn’t the best way to explain my point. I meant to say that the response of the convolutional kernel would have values that are equal to zero (the masked out locations). These will have the bias term added to them and hence will evaluate to 0+bias. Kindly correct me if am missing something.

eqy · May 25, 2021, 12:58am

Ok, do you mean that you want no bias to be applied for input patches that look like

1 0 1
0 0 0
1 0 1

? I think that is trickier to achieve since it almost looks like some kind of data-dependent control flow. However, I would be surprised if this made a large difference in the accuracy of a model though, as these are exactly the kind of patterns that I would expect a convolutional model to learn around/adapt to.

mjdmahsneh · May 25, 2021, 12:14pm

I agree, I think it would have a minimal effect on the model. Thank you for your response and help, it is very much appreciated.

Vasquez122 · May 31, 2021, 9:51am

This is a great inspiring article. I am pretty much pleased with your good work!

mjdmahsneh · June 2, 2021, 12:42am

Thank you @Vasquez122, I actually built upon Applying custom mask on kernel for CNN. Thanks to their efforts.

Ahmed_Tarek · July 15, 2025, 2:30pm

Thank you for sharing your implementation, it was very helpful

I have a minor suggestion that can simplify the code. We can actually skip this section:

with torch.no_grad():
                self.conv.weight = nn.Parameter(self.conv.weight * self.filter_mask)```

My understanding is that register_buffer treats the tensor as a non-trainable parameter, so wrapping it with nn.Parameter and using torch.no_grad isn’t really necessary.

Here’s how the code could look after simplifying it:

class MaskedConv2d(nn.Module):
    def __init__(self,cin,cout,kernel, mask,padding='same'):
        super().__init__()
        self.conv_ = nn.Conv2d(in_channels = cin,
                                out_channels = cout,
                                kernel_size = kernel,
                                padding=padding,
                               bias=False)
        self.register_buffer('mask', mask)
    def forward(self,x):
        self.conv_.weight.data = self.conv_.weight.data * self.mask
        return self.conv_(x)