Reinforcement Learning (DQN) tutorial bugs


I’ve installed PyTorch from source (yesterday master) and Reinforcement Learning (DQN) tutorial needs couple tweaks to run :wink: Can I update it with PR or you will do this? Problems are as follows (I refer to downloaded Python source):

Line 449: _, reward, done, _ = env.step(action**.cpu().numpy()**[0, 0]), accessing torch.Tensor element no longer returns value, it’s needed to convert it to numpy explicitly. Note that I don’t use CUDA, but converting it to cpu doesn’t hurt me but is needed for CUDA case.

Line 414: expected_state_action_values = Variable(**.view(-1, 1)**), without adding this dummy dim further in code (line 417) there are problems with F.smooth_l1_loss(...).

And of course tourch.no_grad() should be used, but it’s not in official release for now so I omit it.

We haven’t updated the tutorials yet because we haven’t officially released pytorch 0.4 yet. They’ll be released when that happens :slight_smile:

Oh I see, thanks then :slight_smile:

What will happen if in case all sampled states have next state as None

if len(memory) < BATCH_SIZE:
transitions = memory.sample(BATCH_SIZE)
batch = Transition(*zip(*transitions))

# Compute a mask of non-final states and concatenate the batch elements
non_final_mask = torch.tensor(tuple(map(lambda s: s is not None,
                                      batch.next_state)), device=device, dtype=torch.uint8)
non_final_next_states =[s for s in batch.next_state
                                            if s is not None])
state_batch =
action_batch =
reward_batch =

Is this is a bug?