Use part of the loss tensor for backpropagation?

Happy holidays!
I’d like to use the part of the loss tensor for backpropagation because zero loss leads to nan…
See the toy example below:

x = Variable(torch.FloatTensor(np.random.rand(3, 1, 64, 64))).cuda(1)
y = Variable(torch.FloatTensor(np.random.rand(3, 1, 64, 64))).cuda(1)

idx = np.random.randint(0, x.size()[0])
y[idx, ...] = x[idx, ...]

x = model(x)
y = model(y)

loss = x.sub(y).pow(2.0).sum(dim=1).pow(0.5)

loss = loss[loss.data > 0.0]
print(loss)

loss = loss.mean()

optim.zero_grad()
loss.backward()
optim.step()

print(model._modules['net'][0].weight[0, 0, 0:5, 0:5])

here’s the output:

Variable containing:
0.0000
0.4017
0.3996
[torch.cuda.FloatTensor of size 3 (GPU 1)]

Variable containing:
0.4017
0.3996
[torch.cuda.FloatTensor of size 2 (GPU 1)]

Variable containing:
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
[torch.cuda.FloatTensor of size 5x5 (GPU 1)]

I think I’m masking out the zero loss, but it seems like it still affects the result.
If I comment out (making x and y different):

idx = np.random.randint(0, x.size()[0])
y[idx, ...] = x[idx, ...]

the result is not nan.

Could someone point out my mistake here?

Thanks!!

What does your model look like?

Hi! Here’s the network definition:

layers = []

layers += [nn.Conv2d(in_channels, 32, kernel_size=conv_kernel[0]),
           nn.Tanh(),
           nn.LPPool2d(norm_type=2, kernel_size=pool_kernel[0])]
layers += [SpatialSubtractiveNormalization2d(in_channels=32, kernel_size=5)]

layers += [nn.Conv2d(32, 64, kernel_size=conv_kernel[1]),
           nn.Tanh(),
           nn.LPPool2d(norm_type=2, kernel_size=pool_kernel[1])]
layers += [SpatialSubtractiveNormalization2d(in_channels=64, kernel_size=5)]

layers += [nn.Conv2d(64, 128, kernel_size=conv_kernel[2]),
           nn.Tanh(),
           nn.LPPool2d(norm_type=2, kernel_size=pool_kernel[2])]

Why does the model matter?