Prediction for DQN official tutorial

Majid_Shirazi · May 29, 2020, 8:46pm

Hi everyone,

I was using the Pytorch DQN tutorial for RL in my gym environment. The training would be done. But when I am trying to use the agent for prediction, I would get the error of : " RuntimeError: size mismatch, m1: [1 x 288], m2: [2592 x 7] at /opt/conda/conda bld/pytorch_1579022119164/work/aten/src/TH/generic/THTensorMath.cpp:1366".

For prediction I am trying to do this below after loading the trained agent.

policy_net.load_state_dict(torch.load('agent1.pth'))
net = policy_net

#As I am doing prediction I deleted the eps values.(https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html)

def select_test_action(state):
    global steps_done
    sample = random.random()
    if True:
        with torch.no_grad():
            # t.max(1) will return largest column value of each row.
            # second column on max result is index of where max element was
            # found, so we pick action with the larger expected reward.
            return net(state).max(1)[1].view(1, 1)
    else:
        print('there is problem in prediction process')

LOADED NET:
1st:  DQN(
  (conv1): Conv2d(3, 16, kernel_size=(5, 5), stride=(2, 2))
  (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2): Conv2d(16, 32, kernel_size=(5, 5), stride=(2, 2))
  (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv3): Conv2d(32, 32, kernel_size=(5, 5), stride=(2, 2))
  (bn3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (head): Linear(in_features=2592, out_features=7, bias=True)

and the prediction loop:

action =  np.array([0])
rewards_agent_1 = []
rewards_agent_2 = []
env.reset()
env.render(mode='human')
counter = 0
one = True
#last_screen = get_screen()
#current_screen = get_screen()
#state = current_screen - last_screen
while one:
    state, reward, done, info = env.step(action.item())
    state_np = np.array(state)
    #print(state_np.shape)
    images_np = np.full((1, 3, len(state_np), len(state_np[0])), 0)
    images_np[0, 0] = state_np[:,:,0]
    images_np[0, 1] = state_np[:,:,1]
    images_np[0, 2] = state_np[:,:,2]
    #print(images_np.shape)
    state_torch = torch.from_numpy(images_np)
    print('input: ', state_torch.shape)
    print ('Starting prediction for agent 1!')
    action = select_test_action(state_torch.float())

Before linear output layer, my 2 Convs outputs are :
input: torch.Size([1, 3, 48, 48])
Starting prediction for agent 1!
conv1: torch.Size([1, 32, 9, 9])
conv2: torch.Size([1, 32, 3, 3])

ptrblck · May 31, 2020, 7:14am

The shapes are a bit weird, as the linear layer is raising the error, however conv3 should also not work, as the kernel size of 5 is bigger than the spatial size of its input (3x3).

Are you sure these shapes are the output shapes of conv1 and conv2, as they seem to be the output shapes of conv2 and conv3, which would also match the input shape.

Anyway, the input features of head should be 32*3*3=288.
If your training was working, it seems that you’ve changed the input shape for testing the model?