as you can see, This is a linear network and no activation function.
I think when I am using backward function, The gradient of the input vector is supposed to be constant, but the result seems change. The following image show the grad of full zero vector

The gradient is not constant, it’s dependent on the input. Gradient is the derivative of the loss w.r.t. the weights. When the input is all zeros, the derivative is going to be zero everywhere (since d(0*W)/d(W) = 0) however when the input is nonzero somewhere, the derivative is no longer zero everywhere.

Yes, I have found the reason. The network has a normalization towards the input and the output of the network, making it no longer a linear network. So the dy/dx is dependent on the x