Hey,

I have H=10 groups of time series. Each group contains C=15 correlated time series. Each time series has a length W=100. I feed the data in batches X of shape BCHW = (32,15,10,100) to my model.

What I would like to do is to independently apply 1d-convolutions to each “row” 1,…,H in the batch. At first, I used a compact workaround:

```
layer = nn.Conv2d(15,15,kernel_size=(1,k))
output = layer(X)
```

Here, the kernel does not overlap the rows and is calibrated to extract joint patterns that appear in all the rows. I did this to capture joint features and interactions between the rows. However, to separate the rows further and not mix up the interactions, I would now like to add individual 1d-convolutions for each row H.

Thus, I would have to define H Conv1d layers, which are then applied to each row.

```
layers = []
for h in range(H):
layers.append(nn.Conv1d(15,15,kernel_size=k))
layers = nn.ModuleList(layers)
```

`layers[h]`

would then correspond to the convolution applied to the h’th row of the matrix.

As I understand, this approach will increase the number of model parameters H-fold, as compared to the Conv2d approach.

The output of the Conv1d approach would be computed like this:

```
outputs = []
for h in range(H):
outputs.append(layers[h](X[:,:,h,:]).unsqueeze(2))
output = torch.cat(outputs,dim=2)
```

My question is how to parallelize this / simultaneously apply the conv1d’s to the batch matrix rows such that the time to compute remains approximately the same as before, and does not increase H-fold, as well? Due to the larger number of parameters, it will eventually take longer, but if the loop can be parallelized, it should still save some time, right? How would that be possible?

Thanks!

Best, JZ