I have seen other posts about this error, but theirs are different from mine.
after spending a whole night debugging, I located the error, but I can’t fit it (can’t figure out why it’s happening).
here is the minimum code to reproduce it ::
from easydl import *
setGPU('0,1')
feature_extractor = nn.Linear(10, 10)
classifier = nn.Linear(10, 10)
net = nn.Sequential(feature_extractor, classifier)
net.cuda()
net = nn.DataParallel(net)
discriminator = nn.Sequential(
# place 1
GradientReverseModule(lambda step: aToBSheduler(step, 0.0, 1.0, gamma=10, max_iter=10000)),
nn.Linear(10,1)
)
discriminator.cuda()
discriminator = nn.DataParallel(discriminator)
op = optim.SGD(net.parameters(),lr=1)
for _ in range(2):
with OptimizerManager(op):
im_source = Variable(torch.from_numpy(np.random.rand(36, 10).astype(np.float32))).cuda()
im_target = Variable(torch.from_numpy(np.random.rand(36, 10).astype(np.float32))).cuda()
outs_source = net.forward(im_source)
outs_target = net.forward(im_target)
d_source = discriminator(outs_source)
d_target = discriminator(outs_target)
if len(sys.argv) > 1:
# place 2
loss = torch.sum(outs_source) + torch.sum(outs_target) + torch.sum(d_source) + torch.sum(d_source)
else:
# place 3
loss = torch.sum(outs_source) + torch.sum(outs_target)
loss = loss * loss.detach()
loss.backward()
error happens at this line loss.backward()
.
there are 3 places that I marked in the code above.
I have made 2 observations:
- if code at place 1 is removed, no error is reported
- else, if I use place 2, I get an error of “arguments are located on different GPUs”. if I use place 3, no error is reported.
code at place 1 has documentation here . In short, It servers as identity mapping at forward pass and reverses the gradient at backward pass. the scheduler changes the coefficient of backward pass gradually. documentation about aToBSheduler
is here
how can I fix it if I want to use code at place 1?