When using DataParallel for the model all the tensors you give should be on device 0. That is a pytorch format, they get copied themselves. I think this discussion might help you. How to solve the problem of `RuntimeError: all tensors must be on devices[0]`