Happy holidays!
I’d like to use the part of the loss tensor for backpropagation because zero loss leads to nan…
See the toy example below:
x = Variable(torch.FloatTensor(np.random.rand(3, 1, 64, 64))).cuda(1)
y = Variable(torch.FloatTensor(np.random.rand(3, 1, 64, 64))).cuda(1)
idx = np.random.randint(0, x.size()[0])
y[idx, ...] = x[idx, ...]
x = model(x)
y = model(y)
loss = x.sub(y).pow(2.0).sum(dim=1).pow(0.5)
loss = loss[loss.data > 0.0]
print(loss)
loss = loss.mean()
optim.zero_grad()
loss.backward()
optim.step()
print(model._modules['net'][0].weight[0, 0, 0:5, 0:5])
here’s the output:
Variable containing:
0.0000
0.4017
0.3996
[torch.cuda.FloatTensor of size 3 (GPU 1)]
Variable containing:
0.4017
0.3996
[torch.cuda.FloatTensor of size 2 (GPU 1)]
Variable containing:
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
[torch.cuda.FloatTensor of size 5x5 (GPU 1)]
I think I’m masking out the zero loss, but it seems like it still affects the result.
If I comment out (making x and y different):
idx = np.random.randint(0, x.size()[0])
y[idx, ...] = x[idx, ...]
the result is not nan.
Could someone point out my mistake here?
Thanks!!