1D convolution on 4D input tensor to reduce the temporal dimension

I have stacked up 100 sequential images of size (100, 3, 16, 701). Given this 4D input tensor excluding the batch size, I want to use a 1D convolution with kernel size n (i.e 100) on temporal dimension to reduce the temporal dimension from n to 1. And again perform a 2D convolution with the output of size (3, 16, 701).

I am using resnet -18 for training.

Suggestion on how to set the parameters for 1d conv and 2d conv on the network would be appreciated.

Thanks in advance.

Hi @Achu_Chandran,

I’m not hundred percent sure, if I understood the data shapes correct, but you have a sequence of 100 images, which have 3 channels and are of shape 16 x 701?

In this case you could use a Conv3d for the temporal reduction and another Conv3d or Conv2d for the spatial operation.
Here is an example:

B, S, C, H, W =  2, 100, 3, 16, 701
HIDDEN_C = 8        # not sure if you need a hidden channel

t = torch.randn(B, S, C, H, W)
t = t.transpose(1, 2)   # swap seq and channels dim -> SHAPE: [B, C, S, H, W]

conv_temp = nn.Conv3d(3, HIDDEN_C, kernel_size=(100, 1, 1))
conv_spatial = nn.Conv3d(HIDDEN_C, 3, kernel_size=(1, 3, 3), padding=(0, 1, 1))

out = conv_spatial(F.relu(conv_temp(t)))
out = out.squeeze(2)    # remove seq dim -> SHAPE: [B, C, H, W]


If I understood your data shapes wrong, please correct me.


1 Like

Hi @Caruso,
Thanks for your response !!

The data size is correct and the example works completely fine with my inputs. :smile:
Thanks again !!!

Between, I did not understand the use of hidden channel here.


Between, I did not understand the use of hidden channel here.

Well I just added for simplicity, but you can think of it as an extra step in the reduction of the temporal data. If HIDDEN_C=3 the amount of features before and after conv_spatial is identical and we only reduce the amount in the first convolution. If HIDDEN_C>3 there is a more soft reduction and the output of the conv_temp is in an intermediate state of our input and target output, in terms of shape. But the question is does this really make a difference?

If HIDDEN_C is close to 3, there is also the possibility ‘that information is lost, where certain points of the manifold collapse into each other’, to quote MobileNetV2. You can run the code below and you will see, that if the ‘hidden’ dimension is only a small multiple of the original dimension, that certain information is lost, but less in higher dims. So this could be another reason to choose a HIDDEN_C >3, but in the end just test what works best ^^

import math
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt


ns = [2, 3, 6, 10, 15, 30]
MAX_N = ns[-1]

D = 2
x = torch.empty(300, D)
w = torch.randn(D, D*MAX_N)

for i in range(x.shape[0]):
    x[i] = torch.tensor([i * math.cos(i/10), i * math.sin(i/10)])

fig, axs = plt.subplots(N_SEEDS, len(ns) + 1, figsize=(14, 6))

for seed in range(N_SEEDS):
    w = torch.randn(D, D*MAX_N)

    axs[seed, 0].set_title("original")
    axs[seed, 0].set_aspect('equal')
    axs[seed, 0].set_axis_off()
    axs[seed, 0].plot(x[:, 0], x[:, 1])

    for i, n in enumerate(ns, start=1):
        w_ = w[:, :D*n]
        x_up = F.relu(x @ w_)
        x_down = x_up @ w_.pinverse()

        axs[seed, i].set_title(f"Output/dim={n}")
        axs[seed, i].set_aspect('equal')
        axs[seed, i].set_axis_off()
        axs[seed, i].plot(x_down[:, 0], x_down[:, 1])


I got it. Thanks for the explanation, @Caruso.