Expected 4-dimensional input, got 3-dimensional input

I have 3-dimensional input tensor with size (1,128, 100) when the agent selects the action and (batch_size, 128, 100) when the agent trains. The input is a sequence of words that tokenized and get vector for every token from Word2Vec model and concatenate to a tensor. So 128 is the number of tokens and 100 is W2V vector size. In this convolutional network:

class Actor(nn.Module):
    def __init__(self, state_dim, hidden_dim, action_dim):
        super(Actor, self).__init__()

        self.action_layer = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=2),
            nn.Conv2d(32, 64, kernel_size=4),
            nn.Linear(64, action_dim),

    def forward(self, state):
        action_probs = self.action_layer(state)
        return action_probs

I got this error:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 1, 2, 2], but got 3-dimensional input of size [1, 128, 100] instead 

Also, I am confused about some parameter values. Is in_channel=1 because of input type, is correct? Please guide me how fix this error.
Thanks in Advance

nn.Conv2d layers expect a 4-dimensional input tensor in the shape [batch_size, channels, height, width]. Based on your error and description I guess the channel dimension is missing, so you could add it via x = x.unsqueeze(1) before passing the tensor to the model.

1 Like

@ptrblck, the dimension of tensor that I want to pass to CNN: is [32, 3, 512, 512]. It has 32 slices of one image, each of which has three chanels. However CNN expects 4 dimensional input tensor [B,C,H,W]. How can change my tensor [32, 3, 512, 512] to get passed as 4D following the expected input order [B,C,H,W].

If each slice of the image should be treated as a separate sample, your tensor shape would already be right.

No, they do not have to be considered as a separate sample, rather all of them have to be considered as 1 sample. that’s why I want to squish the first two dimensions to make it compliant with the expected input order [B,C,H,W] .

Since you’ve mentioned “slices” I would guess you want to treat this dimension as the “depth” then?
If so, you should use a 3D model and pass the input as [batch_size, channels, depth, height, width] via:

x = torch.randn(32, 3, 512, 512)
x = x.permute(1, 0, 2, 3).contiguous().unsqueeze(0)
# > torch.Size([1, 3, 32, 512, 512])

Yes, I will be using 3D CNN later, but at the moment I want to run resnet as a baseline and have 32 slices per sample. [32,3,512,512] is the tensor dimension, I want to squish the first two dimensions to make it a 3D tensor, so that I can pass it as [B,C,H,W] in the network.

I’m not sure how the description fits the shape, but assuming you want to move the sliced into the channel dimension, you could use x = x.view(-1, 512, 512).unsqueeze(0) to get a tensor of [1, 96, 512, 512] which would then of course not work anymore in a standard ResNet model since 3 input chanels are expected. If this is your use case, you could replace the first conv layer with a new one accepting 96 channels.