Hi everyone,
I was using the Pytorch DQN tutorial for RL in my gym environment. The training would be done. But when I am trying to use the agent for prediction, I would get the error of : " RuntimeError: size mismatch, m1: [1 x 288], m2: [2592 x 7] at /opt/conda/conda bld/pytorch_1579022119164/work/aten/src/TH/generic/THTensorMath.cpp:1366".
For prediction I am trying to do this below after loading the trained agent.
policy_net.load_state_dict(torch.load('agent1.pth'))
net = policy_net
#As I am doing prediction I deleted the eps values.(https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html)
def select_test_action(state):
global steps_done
sample = random.random()
if True:
with torch.no_grad():
# t.max(1) will return largest column value of each row.
# second column on max result is index of where max element was
# found, so we pick action with the larger expected reward.
return net(state).max(1)[1].view(1, 1)
else:
print('there is problem in prediction process')
LOADED NET:
1st: DQN(
(conv1): Conv2d(3, 16, kernel_size=(5, 5), stride=(2, 2))
(bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(16, 32, kernel_size=(5, 5), stride=(2, 2))
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(32, 32, kernel_size=(5, 5), stride=(2, 2))
(bn3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(head): Linear(in_features=2592, out_features=7, bias=True)
and the prediction loop:
action = np.array([0])
rewards_agent_1 = []
rewards_agent_2 = []
env.reset()
env.render(mode='human')
counter = 0
one = True
#last_screen = get_screen()
#current_screen = get_screen()
#state = current_screen - last_screen
while one:
state, reward, done, info = env.step(action.item())
state_np = np.array(state)
#print(state_np.shape)
images_np = np.full((1, 3, len(state_np), len(state_np[0])), 0)
images_np[0, 0] = state_np[:,:,0]
images_np[0, 1] = state_np[:,:,1]
images_np[0, 2] = state_np[:,:,2]
#print(images_np.shape)
state_torch = torch.from_numpy(images_np)
print('input: ', state_torch.shape)
print ('Starting prediction for agent 1!')
action = select_test_action(state_torch.float())
Before linear output layer, my 2 Convs outputs are :
input: torch.Size([1, 3, 48, 48])
Starting prediction for agent 1!
conv1: torch.Size([1, 32, 9, 9])
conv2: torch.Size([1, 32, 3, 3])