Invalid Gradient during Backward Pass

Hi,
I have a similar issue. The question you are asking about parts use different GPUs is the case in mine. (I don’t use data parallelism).
If I use different GPUs and concat, does the loss function need to be changed? (I get the same exception.)

Please see this post for complete code.
https://discuss.pytorch.org/t/runtimeerror-function-catbackward-returned-an-invalid-gradient-at-index-1-expected-device-1-but-got-0/33958

Thanks for the help.