I try to use backward() to compute the gradient w.r.t parameter of some linear network. I found that if the weight matrix is sparse, say one row is all 0, the gradient got by back propagation can be wrong. Some gradient shouldn’t be zero but give back zero. Can anyone help me verify it?
Thanks