Thanks for the debugging!
Since some methods are missing in your gist, you would have to help me out with some more debugging.
It looks like the batch dimension is missing for x, but I’m not sure why and how your model seems to be working even without nn.DataParallel.
Usually your input should have the shape [batch_size, nb_features] to be a valid input for a linear layer.
If you are using a single sample, it should be [1, 65] for your model.
Since nn.DataParallel splits your data in dim0, you should provide a batch size being a multiple of the number of GPUs. In your setup you should provide a batch size of 3, 6, 9, ... so that the data can be split among each GPU.
Let me know, if you need some more help in debugging.
Yeah, I think what’s going on is in RL, we look at the state at one timestep (size: ). The batch size could be greater than 1, but we still pass a single time_step of state to the model. So the model doesn’t look at [32, 65], but . But nn.parallel expects more shape to be greater than 1, like you say it splits in dim0.
Most examples I see have a Variable declared with something like [batch_size, num_features] but mine doesn’t so I’ll have to figure out whether or not I want to keep passing a single example (I suppose if I want multiple GPUs I do have to change it).