Hi,

I wanted to test multi-gpu with simple DQN algorithm for Cartpole environment, and it seems that either it is not possible or I am missing something. I run the code in a machine which has two gpus. I have my code here :

The problem is that in each train step, I need to obtain the target value, which is a DataParallel object, then multiply it to some elements in the batch (which is a tensor) and then use the output to obtain the loss. When, I multiply target value to the tensor, I am getting this error:

`RuntimeError: The size of tensor a (64) must match the size of tensor b (128) at non-singleton dimension 1`

Since the `DataParallel`

has split the batch into two parts, I have target shape of `2*64`

instead of `128`

. I can reshape the tensor to `2*64`

and make it work, but I it is not gonna work If I have more gpus and reshaping would be hard and messy. I thought there should be a better way to do this.

I appreciate any help or comment.

Afshin