But the output of x.grad is all 0s. I am wondering how this could happen?
If I change InstanceNorm2d to BatchNorm2d, x.grad is sill all 0s.
But if I change InstanceNorm2d to Sigmoid, x.grad has none-zero grads.

Could someone explain this weird phenomenon?
Thanks!

I’m not sure about the exact formula for instance norm, but in your code sample, you can move everything to double then use torch.autograd.gradcheck(instaceN, (x,)) to check the gradients using finite differencing and the check passes.
So these are the expected gradients. I think the Jacobian is not 0 but, the sum() operation on y makes you compute 1^T J which sums the columns, and this sum is 0.

First of all, a lot of your code is unnecessary. You don’t need to repeat default parameters, and weight and bias aren’t used with affine=False (default). Cleaning it up, we have

x = torch.arange(0., 8).reshape((2, 1, 2, 2))
x.requires_grad = True
instaceN = nn.InstanceNorm2d(1, eps=0.0)
y = instaceN(x)
z = y.sum()
z.backward()
print(x.grad)

Back to your problem. \sum of any channel in an IN output will always be zero. So for your specific z formula, any input x maps to all zeros, and then x has all zero gradient naturally.