Applying separate convolutions to each row of input?

Hey,

I have H=10 groups of time series. Each group contains C=15 correlated time series. Each time series has a length W=100. I feed the data in batches X of shape BCHW = (32,15,10,100) to my model.
What I would like to do is to independently apply 1d-convolutions to each “row” 1,…,H in the batch. At first, I used a compact workaround:

layer = nn.Conv2d(15,15,kernel_size=(1,k))
output = layer(X)

Here, the kernel does not overlap the rows and is calibrated to extract joint patterns that appear in all the rows. I did this to capture joint features and interactions between the rows. However, to separate the rows further and not mix up the interactions, I would now like to add individual 1d-convolutions for each row H.

Thus, I would have to define H Conv1d layers, which are then applied to each row.

layers = []
for h in range(H):
     layers.append(nn.Conv1d(15,15,kernel_size=k))
layers = nn.ModuleList(layers)

layers[h] would then correspond to the convolution applied to the h’th row of the matrix.
As I understand, this approach will increase the number of model parameters H-fold, as compared to the Conv2d approach.

The output of the Conv1d approach would be computed like this:

outputs = []
for h in range(H): 
     outputs.append(layers[h](X[:,:,h,:]).unsqueeze(2))
output = torch.cat(outputs,dim=2)

My question is how to parallelize this / simultaneously apply the conv1d’s to the batch matrix rows such that the time to compute remains approximately the same as before, and does not increase H-fold, as well? Due to the larger number of parameters, it will eventually take longer, but if the loop can be parallelized, it should still save some time, right? How would that be possible?

Thanks!

Best, JZ

Hi Jay!

You can use the groups feature of Conv1d. To do so you will have
to merge your channels (C) and rows (H) dimensions together, using
.transpose() so that they get merged in the correct order.

Here is an illustrative script:

import torch
print (torch.__version__)

_ = torch.manual_seed (2022)

B = 3
C = 15
H = 10
W = 100

X = torch.randn (B, C, H, W)
print ('X.shape:', X.shape)

k = 3

layers = []
for h in range(H):
    layers.append (torch.nn.Conv1d (C, C, kernel_size=k))
layers = torch.nn.ModuleList (layers)

# "row-wise" convolution, for-loop version
outputs = []
for  h in range (H): 
    outputs.append (layers[h] (X[:, :, h, :]).unsqueeze (2))
output = torch.cat (outputs, dim=2)
print ('output.shape:', output.shape)

# in order to verify that the two formulations are equivalent we must non-randomly
# initialize grouped_conv to concatenated weights and biases from layers
grouped_weight = []
grouped_bias = []
for  l in layers:
    grouped_weight.append (l.weight)
    grouped_bias.append (l.bias)
grouped_weight = torch.cat (grouped_weight, dim = 0)
grouped_bias = torch.cat (grouped_bias, dim = 0)

grouped_conv = torch.nn.Conv1d (H * C, H * C, kernel_size = k, groups = H)
with torch.no_grad():   # reinitialize grouped_conv to agree with layers
    _ = grouped_conv.weight.copy_ (grouped_weight)
    _ = grouped_conv.bias.copy_ (grouped_bias)

# "row-wise" convolution, loop-free version
outputB = grouped_conv (X.transpose (1, 2).reshape (B, H * C, -1)).reshape (B, H, C, -1).transpose (1, 2)
print ('outputB.shape:', outputB.shape)

# check that the two versions agree
print ('torch.allclose (output, outputB, atol = 1.e-6):', torch.allclose (output, outputB, atol = 1.e-6)

And here is its output:

1.10.2
X.shape: torch.Size([3, 15, 10, 100])
output.shape: torch.Size([3, 15, 10, 98])
outputB.shape: torch.Size([3, 15, 10, 98])
torch.allclose (output, outputB, atol = 1.e-6): True

Note, there is no need to explicitly initialize the weight and bias of
group_conv – you can use the built-in random initialization. I only
performed the explicit initialization in the example so that we could
cross-check the results of the two methods.

Best.

K. Frank

1 Like

Hey KFrank,

wow, thanks much for taking the time to make such an illustrative example! Always great to see how responsive people are in this forum. Your example solves my case, thanks!

Best, JZ