Use part of the loss tensor for backpropagation?

lg-zhang · December 22, 2017, 10:43pm

Happy holidays!
I’d like to use the part of the loss tensor for backpropagation because zero loss leads to nan…
See the toy example below:

x = Variable(torch.FloatTensor(np.random.rand(3, 1, 64, 64))).cuda(1)
y = Variable(torch.FloatTensor(np.random.rand(3, 1, 64, 64))).cuda(1)

idx = np.random.randint(0, x.size()[0])
y[idx, ...] = x[idx, ...]

x = model(x)
y = model(y)

loss = x.sub(y).pow(2.0).sum(dim=1).pow(0.5)

loss = loss[loss.data > 0.0]
print(loss)

loss = loss.mean()

optim.zero_grad()
loss.backward()
optim.step()

print(model._modules['net'][0].weight[0, 0, 0:5, 0:5])

here’s the output:

Variable containing:
0.0000
0.4017
0.3996
[torch.cuda.FloatTensor of size 3 (GPU 1)]

Variable containing:
0.4017
0.3996
[torch.cuda.FloatTensor of size 2 (GPU 1)]

Variable containing:
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
nan nan nan nan nan
[torch.cuda.FloatTensor of size 5x5 (GPU 1)]

I think I’m masking out the zero loss, but it seems like it still affects the result.
If I comment out (making x and y different):

idx = np.random.randint(0, x.size()[0])
y[idx, ...] = x[idx, ...]

the result is not nan.

Could someone point out my mistake here?

Thanks!!

SimonW · December 23, 2017, 9:00am

What does your model look like?

lg-zhang · December 23, 2017, 9:13pm

Hi! Here’s the network definition:

layers = []

layers += [nn.Conv2d(in_channels, 32, kernel_size=conv_kernel[0]),
           nn.Tanh(),
           nn.LPPool2d(norm_type=2, kernel_size=pool_kernel[0])]
layers += [SpatialSubtractiveNormalization2d(in_channels=32, kernel_size=5)]

layers += [nn.Conv2d(32, 64, kernel_size=conv_kernel[1]),
           nn.Tanh(),
           nn.LPPool2d(norm_type=2, kernel_size=pool_kernel[1])]
layers += [SpatialSubtractiveNormalization2d(in_channels=64, kernel_size=5)]

layers += [nn.Conv2d(64, 128, kernel_size=conv_kernel[2]),
           nn.Tanh(),
           nn.LPPool2d(norm_type=2, kernel_size=pool_kernel[2])]

Why does the model matter?