Reinforcement Learning (DQN) tutorial bugs

Hi!

I’ve installed PyTorch from source (yesterday master) and Reinforcement Learning (DQN) tutorial needs couple tweaks to run :wink: Can I update it with PR or you will do this? Problems are as follows (I refer to downloaded Python source):

Line 449: _, reward, done, _ = env.step(action**.cpu().numpy()**[0, 0]), accessing torch.Tensor element no longer returns value, it’s needed to convert it to numpy explicitly. Note that I don’t use CUDA, but converting it to cpu doesn’t hurt me but is needed for CUDA case.

Line 414: expected_state_action_values = Variable(expected_state_action_values.data**.view(-1, 1)**), without adding this dummy dim further in code (line 417) there are problems with F.smooth_l1_loss(...).

And of course tourch.no_grad() should be used, but it’s not in official release for now so I omit it.

We haven’t updated the tutorials yet because we haven’t officially released pytorch 0.4 yet. They’ll be released when that happens :slight_smile:

Oh I see, thanks then :slight_smile:

What will happen if in case all sampled states have next state as None

if len(memory) < BATCH_SIZE:
        return
transitions = memory.sample(BATCH_SIZE)
batch = Transition(*zip(*transitions))

# Compute a mask of non-final states and concatenate the batch elements
non_final_mask = torch.tensor(tuple(map(lambda s: s is not None,
                                      batch.next_state)), device=device, dtype=torch.uint8)
non_final_next_states = torch.cat([s for s in batch.next_state
                                            if s is not None])
state_batch = torch.cat(batch.state)
action_batch = torch.cat(batch.action)
reward_batch = torch.cat(batch.reward)

Is this is a bug?