Hi all, I was trying to distribute several models (say 8) into 8 gpu devices. For each model models[i], I calculated the sum of output as follows:
for i in range(args.num_models):
input_var = input_var.to("cuda:%d" % (i))
target_var = target_var.to("cuda:%d" % (i))
criterion = nn.CrossEntropyLoss().to("cuda:%d" % (i))
models[i] = models[i].to("cuda:%d" % (i))
loss += criterion(models[i](input_var), target_var)
loss.backward()
But some error occurred:
RuntimeError: Function AddBackward0 returned an invalid gradient at index 1 - expected device cuda:1 but got cuda:0
Any suggestions? Thanks!