Reduce dimensionality 5D -> 3D using conv3d

r00bi · April 19, 2023, 7:08pm

I want to reduce a tensor with shape of (32,32,26, 40,10) to (32,32,512). Basically, I want to keep the batch size and sequence length, and embed the last three dimension to 512.
How can I do this reduction using conv3d?
Thanks!

ptrblck · April 19, 2023, 7:48pm

I don’t know how you are interpreting the input data, but I guess you are treating the channel dimension in dim1 as the “sequence length”?
3D convolutions expect an input in the shape [batch_size, channels, depth, height, width].
I don’t think using a plain nn.Conv3d would work directly as you are increasing the depth size from 26 to 512.
Maybe you want to flatten the depth, height, width into a single feature dimension and apply an nn.Linear layer instead?

x = torch.randn(32, 32, 26, 40, 10)
x = x.view(x.size(0), x.size(1), -1)
linear = nn.Linear(26*40*10, 512)
out = linear(x)
print(out.shape)
# torch.Size([32, 32, 512])

r00bi · April 20, 2023, 1:22pm

Hi,
Yes, the sequence length is interpreted as channel. using a linear on flatten d, h, w is a solution.
Thanks!