How to treat variable length sequence in Conv1D layers?

I’ve got a set of sequences that are shaped like this:

# batch_size x features x length
seq = tensor([[[-1.6664,  1.3793, -1.0168, -0.2382,  0.3806, -1.4693,  0.0000,
           0.0000],
         [ 0.9992, -0.7673, -0.4289, -0.6654, -1.0930, -0.7454,  0.0000,
           0.0000],
         [ 1.4257,  0.9920, -0.1028,  0.8902,  0.5898,  0.1072,  0.0000,
           0.0000],
         [ 0.1998,  1.0295, -0.2461,  0.1640,  0.5859,  0.6036,  0.0000,
           0.0000]],

        [[ 0.7872,  0.3735, -0.5858, -0.6331, -1.7811, -0.2623,  0.8302,
          -1.5729],
         [-1.7866, -0.9432, -0.0326,  0.5870,  0.9642,  2.0408, -0.8691,
          -1.8870],
         [-2.1594, -0.4498,  1.0198,  0.7867,  0.9520, -0.8631, -0.5116,
           0.2256],
         [ 2.0375,  0.5512, -1.4802, -0.5710,  0.8688,  0.8487,  1.3697,
           1.0811]]])

As you can see, the last 2 positions of the 0th sequence is just 0.000s to represent padding.
What would be the recommendation here? Even a layer like this “compatible”:

conv = nn.Conv1D(n_input_features, n_output_features, kernel_size = 3)
o = conv(seq) # this works

For example, if I had a seq tensor of N x C x L with a kernel of, say, 3, then I get a new tensor with N x C x L-2, and this applies even for sequences with 0 padding.

I’ve read here to try and clip the output? Would the idea then be to do something like this?

o = conv(seq)
# o[:, :, :-seq_lengths]  something along these lines?

I’m asking as I’m trying to migrate a Keras 1D convolution into PyTorch. The Keras layer is written as a “MaskedConvolution” that involves a Masking layer from Keras (presumably to handle variable lengths?). From my understanding it’s not a true Masked convolution per se where the kernel has a 0…

In addition to above about handling variable length sequences in nn.Conv1d, if anyone has tips for Keras’ Masking equivalent would be for PyTorch, I’d also be keen to hear!

Your slicing operation would remove the padding indices in all samples, so I’m unsure if this fits your use case.

Could you explain, what this layer is doing, i.e. how the “masking” is performed and how the outputs and gradients are changed by it?

Hi Piotr,

Thanks – so the code is a bit unusual, but here we are

## this is all Keras that I'm trying to understand
class MaskedConvolution1D(Convolution1D):
    def __init__(self, *args, **kwargs):
        super(MaskedConvolution1D, self).__init__(*args, **kwargs)

    def compute_mask(self, input, input_mask=None):
        return input_mask

    def call(self, x, mask=None):
        assert mask is not None
        mask = K.expand_dims(mask, axis=-1)
        x = super(MaskedConvolution1D, self).call(x)
        return x * K.cast(mask, K.floatx())

class MaskingByLambda(Layer):
    def __init__(self, func, **kwargs):
        self.supports_masking = True
        self.mask_func = func
        super(MaskingByLambda, self).__init__(**kwargs)

    def compute_mask(self, input, input_mask=None):
        return self.mask_func(input, input_mask)

    def call(self, x, mask=None):
        exd_mask = K.expand_dims(self.mask_func(x, mask), axis=-1)
        return x * K.cast(exd_mask, K.floatx())

def lambda_func(tensor):
    return lambda input, mask: tensor

def model(length):
    input = Input(shape=(length, n_features))
    mask = Input(shape=(length,))
    seq = MaskingByLambda(lambda_func(mask))(input)
    conv = MaskedConvolution1D(28, 3, padding='same', activation='relu')(seq)

The dataset I’m working on is a set of sequences, such as:

x = ["This is a sentence", "This is another longer sentence", "Short sentence", "AHHH!"]

This is then tokenised into a 1-0 encoded tensor to be fed into the Conv1D layer. I’m just wondering most importantly how a nn.Conv1d can handle variable length sequences, and how it can replicate the functionality of the MaskedConvolution1D implementation above.

Thanks!!

Conv layers can handle variable sized inputs as long as the kernel size is larger than the temporal or spatial size of the input. The output shape will be determined by the kernel size, stride, padding, and dilation as given in the docs.

Based on the code it seems you are creating a mask and multiplying it with the output activation of the conv layer. If that’s the case, you could use exactly the same workflow in the forward method of your custom conv layer.

Piotr, that was exactly the solution. Thanks for posting, I forgot to write this down after my Eureka moment yesterday :sweat_smile:

For anyone else wanting to do something similar, it was really a case of something like this:

class MaskedConvolution(nn.Module):
    # ...
    def forward(self, x: torch.Tensor, mask: torch.BoolTensor) -> torch.Tensor:
        o = self. .... # do fancy operations here
        assert o.shape == mask.shape, "Mask shape should be the same as the output shape"
        return o * mask
1 Like