What is your model and how are you running the training loop? It’s a little hard to tell from what you’ve provided where the tensors are being moved to another device.
The model is a Generator like DCGAN with a ladder network structure, Convs with PReLU and “Deconvs” with skip connections and PReLU.
Model is instantiated with DataParallel than assigned to the GPU with cuda().
Data batches a reloaded with a DataLoader and at each iteration the training loop is executed.
The forward pass executes with no problem until the return, where it outputs the error message.
I can be more explicit in code if need be.