I have the following issue with Conv2D in pytorch, I am trying to feed in a tensor like this: “in_shape = (64,3, 200, 220)” and I am getting error telling me that it expects this shape: “[2, 64, 3, 200, 220]”. Why would it like to see an extra dimension with 2? Any ideas?

Well,

I would say that pytorch is not saying that

Could you paste the exact error and the layer you think it comes from?

Pytorch 2d convs work with the format you are describing (B,C,H,W)

I have solved it, I was providing more dimensions that it was needed and for some reason conv2d placed 2 in front of the vector’s shape tuple.

But right now I am having an issue about connecting CNN and GRU together for speaker identification task. Any idea how to do that?

You usually use max pooling and average pooling to obtain a feature vector. Then (either the dimensions match) or you can use a linear layer to go from the N feats from the cnn to M feats of the gru.

Yes I am trying to do that, but I have an error like RNN wants a 3d vector but the CNN gives a 4d one.

So you have something like

B,C,T,H,W

then you pull and get

B,C,T,1

squeeze

B,C,T

then you need to permute (depending on whether u use the batch first flag or not)

T,B,C

Linear layer if necessary

T,B,C’

I’m doing something like this:

from torch import nn, optim

# Create a sequential model

model = nn.Sequential()

# Add convolutional and pooling layers

model.add_module(‘Conv_1’, nn.Conv2d(in_channels=3, out_channels=64, kernel_size=(4,4)))

model.add_module(‘LRelu_1’, nn.LeakyReLU())

model.add_module(‘BatchN_1’, nn.BatchNorm2d(64))

model.add_module(‘AvgPool_1’, nn.AvgPool2d(kernel_size=4))

model.add_module(‘BatchN_2’, nn.BatchNorm2d(64))

model.add_module(‘Conv_2’, nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(4,4)))

model.add_module(‘LRelu_2’, nn.LeakyReLU())

model.add_module(‘BatchN_3’, nn.BatchNorm2d(128))

model.add_module(‘AvgPool_2’, nn.AvgPool2d(kernel_size=4))

model.add_module(‘BatchN_4’, nn.BatchNorm2d(128))

model.add_module(‘Conv_3’, nn.Conv2d(in_channels=128, out_channels=128, kernel_size=(3,3)))

model.add_module(‘LRelu_3’, nn.LeakyReLU())

model.add_module(‘BatchN_3’, nn.BatchNorm2d(128))

model.add_module(‘AvgPool_3’, nn.AvgPool2d(kernel_size=2))

model.add_module(‘BatchN_5’, nn.BatchNorm2d(128))

model.add_module(‘Conv_4’, nn.Conv2d(in_channels=128, out_channels=64, kernel_size=(2,2)))

model.add_module(‘LRelu_4’, nn.LeakyReLU())

# Add a Flatten layer to the model

#model.add_module(‘Flatten’, nn.Flatten())

# Add a Linear layer with 64 units and relu activation

model.add_module(‘Linear_1’, nn.Linear(in_features=4, out_features=256, bias=True))

model.add_module(‘LRelu_5’, nn.LeakyReLU())

model.add_module(‘GRU_1’,nn.GRU(256,64,32))

# Add the last Linear layer.

model.add_module(‘Linear_2’, nn.Linear(in_features=256, out_features=183, bias=True))

model.add_module(‘Out_activation’, nn.Softmax(-1)) #??? -1

model = model.to(device)

from torchsummary import summary

in_shape = (3, 200, 220)

summary(model, input_size=(in_shape))

If you could help me that would be fantastic contribution to Pytorch forum, because there is nothing about CNN-GRU’s connection.

Now with this code the error is: **RuntimeError** : input must have 3 dimensions, got 4