I struggled a lot with this error in the last days. In the end I found using self.register_buffer fixes the problem (see here for a similar thread that I learnt from).
In detail, let us assume we have a tensor temp_tensor which is torch.ones(1, 2). To put temp_tensor on the right gpu under parallel_model, we need to do something like this:
self.register_buffer("temp_tensor", torch.ones(1, 2))
Hope it helps!