How share kernel parameters between rows in conv2?


so, imagine we have a 2d matrix input of shape m rows x n columns to a conv2d layer. My goal is to preserve the number of rows of the matrix while running the same kernel along each row.

In more detail, say, we apply a conv2d layer with a kernel (1,k) to the matrix, then, the layer would in my understanding learn one kernel of size (1,k) with individual parameters per row (and per feature map), resulting in kernel k1, …, km. Instead, what i would like to do is to share the parameters of a single kernel of size (1,k) across all rows. In other words, the same kernel is run along each row and produces and output “at the end of the row”. These outputs will be further transformed and fed to the loss function, where they ultimately result in a gradient that we can backpropagate to update the k parameters of the kernel which has been jointly run over each individual row in the matrix.

The idea behind this approach is to (i) make the training more efficient, as more parameters are shared (ii) learn a very generalized kernel that can understand each of the rows and learns their features jointly. Importantly, dont imagine my input matrix as being a picture (for which this approach would probably not make sense). Rather imagine each row represents a series of features and all the series are highly correlated amongst each other (/across the columns). Then, the general kernel should be able to apply to all rows simulatenously.

The problem: I have no idea how to modify the conv2d such that it makes use of only a single (1,k) that is used in each row. Can somebody give me a hint here? :slight_smile:


Best, JZ

Hi Jay!

Is it possible that you misunderstand how Conv2d works?


>>> import torch
>>> torch.__version__
>>> conv = torch.nn.Conv2d (1, 1, kernel_size = (1, 3))
>>> conv.weight
Parameter containing:
tensor([[[[-0.2149,  0.2000, -0.0982]]]], requires_grad=True)
>>> conv.bias
Parameter containing:
tensor([-0.0693], requires_grad=True)

Conv2d has only one kernel, in this case of shape [1, 1, 1, 3], so
there is only one such kernel that can be “learned” and that one kernel
is used for all the rows of the input.


K. Frank

1 Like

Hey K,

it is very likely that I misunderstood, then ^^. I somehow had the thought that one kernel would be learned for each row when the kernel height is 1, which now, as I reviewed the theory doesn’t make much sense. Thanks much for clarifying it!

Best, JZ