output = model(*input)
output.volatile = False
for i in range(BATCH_SIZE):
if nextState_batch[i][0][0].data[0] == 0:
nextStateValues[i] = 0
else:
nextStateValues[i] = output[b[i]: b[i+1]].max(0)[0].data
expectedStateActionValues = (nextStateValues * GAMMA) + reward_batch.data.view(reward_batch.size(0), -1)
loss = F.smooth_l1_loss(stateActionValues, Variable(expectedStateActionValues))
optimizer.zero_grad()
loss.backward()
for param in model.parameters():
if param.grad is not None:
param.grad.data.clamp_(-0.5, 0.5)
optimizer.step()
This is the code snippet which is giving me the error, basically I am implementing a dqn and have followed the official dqn implementation given on the pytorch website.
Can someone please help resolve the runtime error.