Expected 4-dimensional input for 4-dimensional weight [32, 3, 8, 8], but got 3-dimensional input of size [3, 96, 96] instead

I am a beginner in RL so was trying to export a stable baselines3 model(model was of CarRacing-v0 from gym module and CnnPolicy) during which i am facing this issue


And this is the code

import numpy as np
import torch as th
from torch import nn as nn
import torch.nn.functional as F
from torch import tensor
from stable_baselines3.common.vec_env import VecTransposeImage


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.features_extractor = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=8, stride=4),
            nn.ReLU(),
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=4, stride=2),
            nn.ReLU(),
            nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1),
            nn.ReLU(),
            nn.Flatten(start_dim=1, end_dim=-1),
            nn.Linear(in_features=4096, out_features=512, bias=True),
            nn.ReLU()
        )
        self.action_net = nn.Sequential(
            nn.Linear(in_features=512, out_features=3, bias=True),
            nn.ReLU()
        )
    
    def forward(self, x):
        x = self.features_extractor(x)
        x = self.action_net(x)
        x = x.argmax()

        return x


def getMove(obs):
    model = Net()
    model = model.float()
    model.load_state_dict(state_dict)
    model = model.to('cpu')
    model = model.eval()
    obs = obs.copy()
    obs = VecTransposeImage.transpose_image(obs)
    obs = th.as_tensor(obs).to('cpu')
    obs = obs.float() / 255
    obs = obs.float()
    action = model(obs)

    return action

How can i fix it?

Looks like your first convolution
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=8, stride=4)
receives a wrongly shaped input.
What is your batch size? It looks like you are feeding 1 image of size 96x96 with three channels at a time (i.e. your batch_size=1, currently). If you want to keep batch_size=1, I think you can write

x = x.unsqueeze(0)
x = self.features_extractor(x)
# ... 

in your forward function. The unsqueeze will reshape your x from (3,96,96) to (1,3,96,96). Conv2d takes a 4-dimensional input (B,C,H,W), and B=1, then. See: Conv2d — PyTorch 1.11.0 documentation

Also, I am not sure about the argmax in your forward function. I don’t think argmax is differentiable. This would then lead to non-existing gradients (I think), and you cannot compute the gradient of your reward / loss function wrt the model params.

i am getting this error on doing x = x.unsqueeze(0)

can you print x, x.shape, type(x) at the beginning of the function?

printing x is giving a large output
printing x.shape and type(x) gave "torch.Size([3, 96, 96]) <class ‘torch.Tensor’>
"

then i don’t know why it doesn’t work, it should. which pytorch version are you using? maybe still try x = torch.unsqueeze(x,dim=0)