Backward of InstanceNorm2d

ChanggongZhang · October 30, 2019, 10:04am

Dear all,
I have a very simple question about the gradient flowing backward through the InstanceNorm2d layer.
Here are my test codes:

x = torch.arange(0., 8).reshape((2, 1, 2, 2))
x.requires_grad = True
instaceN = nn.InstanceNorm2d(1, affine=False, eps=0.0, track_running_stats=False)
instaceN.weight = nn.Parameter(torch.Tensor([1.0]))
instaceN.bias = nn.Parameter(torch.Tensor([0.0]))
y = instaceN(x)
y.register_hook(print)
z = y.sum()
z.backward()
print(x.grad)

But the output of x.grad is all 0s. I am wondering how this could happen?
If I change InstanceNorm2d to BatchNorm2d, x.grad is sill all 0s.
But if I change InstanceNorm2d to Sigmoid, x.grad has none-zero grads.

Could someone explain this weird phenomenon?
Thanks!

albanD · October 30, 2019, 3:18pm

Hi,

I’m not sure about the exact formula for instance norm, but in your code sample, you can move everything to double then use torch.autograd.gradcheck(instaceN, (x,)) to check the gradients using finite differencing and the check passes.
So these are the expected gradients. I think the Jacobian is not 0 but, the sum() operation on y makes you compute 1^T J which sums the columns, and this sum is 0.

SimonW · October 30, 2019, 9:43pm

First of all, a lot of your code is unnecessary. You don’t need to repeat default parameters, and weight and bias aren’t used with affine=False (default). Cleaning it up, we have

x = torch.arange(0., 8).reshape((2, 1, 2, 2))
x.requires_grad = True
instaceN = nn.InstanceNorm2d(1, eps=0.0)
y = instaceN(x)
z = y.sum()
z.backward()
print(x.grad)

Back to your problem. \sum of any channel in an IN output will always be zero. So for your specific z formula, any input x maps to all zeros, and then x has all zero gradient naturally.