Padding and masking in convolution

Hi, I’m using pytorch to do some encoding things on 1-D inputs by 1-D convolution.
I have 2 questions since the length of inputs is inconsistent in a batch.

1.Currently, I maintain some mask tensors for every layer to mask their outputs by myself. And in my code, I have to compute the change of each mask tensor in the forward method since the size of input and output may be different.
Q:Is this a regular way?

2.Now, I just mask the outputs for each layer without caring about gradients.
Q:Is masking outputs enough to run properly? Should I mask the gradient in register_backward_hook method as well?

Thanks a lot if you can give me some hints.

Hi, @apaszke @ptrblck @tom @smth,
Could you give me some hints about my question? Thanks.

I’m not sure I fully understand your setup nor am I an expert at masking and convolutions, but my attempt at answers would be

  1. that depends on what you are doing, but as soon as you have bias, the masked off bits will have that part added even if the inputs are all zero.
    I would imagine that there are efficient and less efficient ways to compute the mask. For example I might try to maxpool the lower-level mask appropriately if you want a mask that is “all outputs where an unmask input arrived” or you could use a negative mask (i.e. -1 at input, 0 for non-input) and maxpool that to get “all outputs where only inputs arrived”. Then you’d have to see if you come out with the right sign at the top. That way it should be relatively efficient to propagate the masks.
  2. As the loss doesn’t depend on the bits you “masked off”, the gradients should be zero for those parts of the input automatically (unless you have NaNs, but then you’re screwed anyway and should adjust your inputs).

I hope this helps you and look forward to hear about how it worked out for your project.

Best regards


Hi tom, thanks for your reply. I would like to do more description about my solution.

Imaging the input (x) shape is N * C * L and it is padded with zero on invalid position. Meanwhile, there is a “0/1” mask (x_mask) with shape is N * L. In the mask, 0 means padding and 1 means valid position. After convolution, the output (y) shape will be N * C’ * L’ and the mask (y_mask) shape will be N * L’. To get y_mask, I have to compute the change of valid length for every sample in the batch. This step is very inefficient and less elegant in my point of view since there is a loop for N. Finally, I will do y * y_mask to get new input for the next step and I don’t need to worry about the bias. Pseudo code is seen below.

def forward(self, masked_x, x_mask):  # masked_x's shape is N*C*L and x_mask' shape is N*L
    y = self.conv(masked_x)  # y's shape is N*C'*L'
    if mask is not None and self.size_changed:
        x_valid_len = mask_2_size(x_mask)  # x_valid_len's shape is N*1
        for n in x_valid_len[0]:
            x_valid_len[n] = self.output_size(x_valid_len[n])
        y_mask = size_2_mask(x_valid_len)  # y_mask's shape is N*L'
        y = y * y_mask.unsqueeze(1).expand_as(y)
        y_mask = x_mask
    return y, y_mask

The mask_2_size, size_2_mask and self.output_size are functions I wrote.
Thanks for your example but I think it isn’t appropriate for my work, or I got your idea wrong.
I would be very grateful if you can give me some examples in pseudo code.

Well, you could keep your lengths in a tensor and compute it batch-wise or share something about how the lengths relate.

Best regards