Interpreting gradcheck errors

DuaneNielsen · April 11, 2018, 1:56am

Hi, sorry for the basic question.

Gradcheck seems to give 2 separate outputs on a failure. Numerical, and Analytical.

What is the difference between the two? And how would I use this information to find a problem in my model?

Example output from a failed gradcheck…

RuntimeError: for output no. 0,
numerical:(
1 0 0 … 0 0 0
0 1 0 … 0 0 0
0 0 1 … 0 0 0
… ⋱ …
0 0 0 … 1 0 0
0 0 0 … 0 1 0
0 0 0 … 0 0 1
[torch.FloatTensor of size 150x150]
,)
analytical:(
0 0 0 … 0 0 0
0 0 0 … 0 0 0
0 0 0 … 0 0 0
… ⋱ …
0 0 0 … 0 0 0
0 0 0 … 0 0 0
0 0 0 … 0 0 0
[torch.FloatTensor of size 150x150]
,)

SimonW · April 11, 2018, 2:13am

They are numerical Jacobian estimated with point perturbations, and analytical Jacobian computed from autograd.

SimonW · April 11, 2018, 2:16am

The gradcheck error message isn’t the nicest… it is improved on master though!

SimonW · April 11, 2018, 2:17am

Numerical Jacobian seems imply that your module is an identity op, but analytical says the output is independent of the input. (assuming that the first matrix is I and the second is 0.)

DuaneNielsen · April 11, 2018, 2:23am

Gotcha. So numerical is the result of the grad comparing 2 very close values from the forward pass.

And analytical is what autograd computed using the automatic differentiation.

They should closely match, or there is an error.

SimonW · April 11, 2018, 3:28am

Yes, that is correct.

Are you seeing this from a custom autograd function (i.e. you wrote the backward), or from the built-in autograd operations (i.e. backward is automatically computed)?

DuaneNielsen · April 12, 2018, 11:28pm

It’s from a built in. I think it’s caused by the way I declared Variables on the input parameters. Still tracking it down.

DuaneNielsen · April 12, 2018, 11:32pm

Sorry, I should have explained a bit better for the community…

So my understanding is an error like this…

RuntimeError: for output no. 0,
numerical:(
1 0 0 … 0 0 0
0 1 0 … 0 0 0
0 0 1 … 0 0 0
… ⋱ …
0 0 0 … 1 0 0
0 0 0 … 0 1 0
0 0 0 … 0 0 1
[torch.FloatTensor of size 150x150]
,)
analytical:(
0 0 0 … 0 0 0
0 0 0 … 0 0 0
0 0 0 … 0 0 0
… ⋱ …
0 0 0 … 0 0 0
0 0 0 … 0 0 0
0 0 0 … 0 0 0
[torch.FloatTensor of size 150x150]
,)

The giveaway is the matrix of Zeros on the “analytical” result. All the grads are Zero! I guess this means that somewhere in the computation graph, autograd lost track of the gradients. Possibly by passing through a Variable I wrote that didn’t have gradients in it.

That’s how Im interpreting that result.

So I’m checking all my Variables to make sure the ones that should have gradients do have them.

Chen_Zhu · May 8, 2018, 1:36pm

Have your problem been solved? I am implementing a custom operator according to https://pytorch.org/docs/master/notes/extending.html, and also get zero analytical gradients. I am wondering whether it is the problem of the inputs or the problem of my implementation of the backward function.

Chen_Zhu · May 8, 2018, 2:08pm

Just fix the problem. Pay attention to the tutorial:

input = (Variable(torch.randn(20,20).double(), requires_grad=True), Variable(torch.randn(30,20).double(), requires_grad=True),)
test = gradcheck(Linear.apply, input, eps=1e-6, atol=1e-4)

We must set requires_grad=True for the variables.

DuaneNielsen · May 13, 2018, 8:45pm

Yes, thats a bit confusing too, gradcheck tests to see if requires_grad=True on each input, if it is False, it doesn’t run the check on that input.

Normally during training grad=True this is only required if you are intending to propagate the gradient outside the model, say into embeddings or something like that. That’s the confusing bit

However, it is helpful, because sometimes there are inputs that you don’t actually want to grad-check. So in that case, you can have gradcheck ignore them by setting requires_grad=False.