RuntimeError: Need input of dimension 4 and input.size[1] == 4 but got input to be of shape: [1 x 1 x 84 x 84]

I’m trying to implement dqn, but having trouble processing the image before feeding it into my q network.

transform =  T.Compose([
            T.ToPILImage(), 
            T.Lambda(lambda x: x.convert('L')),
            T.Scale((84, 84), interpolation=T.Image.CUBIC),
            T.ToTensor()
        ])  

def process(img):
    return Variable(torch.Tensor(transform(img))).unsqueeze(0)

class Qnet(nn.Module):
    def __init__(self, num_actions):
        super(Qnet, self).__init__()
        # (84 - 8) / 4  + 1 = 20 (len,width) output size
        self.cnn1 = nn.Conv2d(in_channels=4, out_channels=32, kernel_size=8, stride=4, padding=0)
        self.relu1 = nn.ReLU()
        # (20 - 4) / 2 + 1 = 9 (len,width) output size
        self.cnn2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=4, stride=2, padding=0)
        self.relu2 = nn.ReLU()
        # (9 - 3) / 1 + 1 = 7 (len,width )output size 
        self.cnn3 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=0)
        self.relu3 = nn.ReLU()
        # fully connected layer
        self.fc1 = nn.Linear(64 * 7 * 7, 512)
        self.fc2 = nn.Linear(512, num_actions)

    def forward(self, x):
        out = self.cnn1(x)
        out = self.relu1(out)
        out = self.cnn2(out)
        out = self.relu2(out)
        out = self.cnn3(out)
        out = self.relu3(out)
        # Resize from (batch_size, 64, 7, 7) to (batch_size,64*7*7)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        return self.fc2(out)

When running my agent:

# my network:
Q = Qnet(env.action_space.n)

#policy
def e_greedy(state):
    epsilon = 1 / epsilon_step 
    if np.random.random() < epsilon:
        return np.random.choice(range(env.action_space.n), 1)[0]
    else:
        # since pytorch networks expect batch input instead of 1 state, add zeros into the Variable
        qvalues = Q(state)
        maxq, actions = torch.max(qvalues, 1)
        return actions[0].data[0]

for episode in range(10000):
    state = env.reset()
    state = process(state)
    while True:
        env.render()
        action = e_greedy(state)

When I call my network on the processed image state, I get this error:

Traceback (most recent call last):
  File "dqn.py", line 89, in <module>
    action = e_greedy(state)
  File "dqn.py", line 69, in e_greedy
    qvalues = Q(state)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "dqn.py", line 48, in forward
    out = self.cnn1(x)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 254, in forward
    self.padding, self.dilation, self.groups)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/functional.py", line 52, in conv2d
    return f(input, weight, bias)
RuntimeError: Need input of dimension 4 and input.size[1] == 4 but got input to be of shape: [1 x 1 x 84 x 84] at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/THNN/generic/SpatialConvolutionMM.c:47

I don’t understand what I’m doing wrong. I’m passing in a 4D tensor of the right size. Initially I thought that the network only expects batches instead of single input, but even if I add an extra image to the image (dimension= [2 x 1 x 84 x 84], I get the same error. Thanks!

the networks only expects batches.

your input channels have to be 4.

So pass in an input of: 1 x 4 x H x W

Thank you! I wasn’t understanding the dimensions and didn’t realize that I had put 4 input channels.