I want to reduce a tensor with shape of (32,32,26, 40,10) to (32,32,512). Basically, I want to keep the batch size and sequence length, and embed the last three dimension to 512.
How can I do this reduction using conv3d?
Thanks!
I don’t know how you are interpreting the input data, but I guess you are treating the channel dimension in dim1
as the “sequence length”?
3D convolutions expect an input in the shape [batch_size, channels, depth, height, width]
.
I don’t think using a plain nn.Conv3d
would work directly as you are increasing the depth
size from 26
to 512
.
Maybe you want to flatten the depth, height, width
into a single feature dimension and apply an nn.Linear
layer instead?
x = torch.randn(32, 32, 26, 40, 10)
x = x.view(x.size(0), x.size(1), -1)
linear = nn.Linear(26*40*10, 512)
out = linear(x)
print(out.shape)
# torch.Size([32, 32, 512])
1 Like
Hi,
Yes, the sequence length is interpreted as channel. using a linear on flatten d, h, w is a solution.
Thanks!