loss.backward() do not go through while training and throws an error when on multiple GPUs using torch.nn.DataParallel
grad can be implicitly created only for scalar outputs
But, the same thing trains fine when I give only
deviced_ids= to torch.nn.DataParallel.
Is there something I am missing here?
While running on two gpus, the loss function returns a vector of 2 loss values. If I run the backward only on the first element of the vector it goes fine.
How can I make the backward function work with vector containing two or more loss values?