Conditional computation that saves computation

rahul · March 17, 2018, 8:39pm

Hi Alban,
it seems that the technique you suggested here, doesn’t work. I am having issues with index_add. The code that you provided works fine. Bu when I try to do something similar in my algorithm, it does not work. I could not find much information related to index_add. Is it a popularly used function?

rahul · March 19, 2018, 11:19pm

Hi,
its not possible to backpropagate through the graphs! I get the following error
"NoneType object has no attribute data "

jpeg729 · March 19, 2018, 11:32pm

If I take Alban’s example, wrap x and y in Variables, and add out.sum().backward() at the end, then it backpropagates with no problem.

Unless you can show us your code and a full stack trace, then I don’t think anyone can help you track down your bug.

rahul · March 20, 2018, 12:01am

Hi,
so in your case, you are differentiating with respect to the inputs (say input images)? Isn’t that unconventional?
In my case, the graph is more complex. I use a NN to make decisions whether or not to use the next layer in MLP. However, when I try to do backpropagation, I get the error.
Thanks

jpeg729 · March 20, 2018, 12:14am

The code examples work and will backpropagate correctly and unless you provide code I can’t help you any more than that.

If you don’t want to show your code can you at least produce a minimal code example that demonstrates the error?

rahul · March 20, 2018, 12:16am

Hi,
sorry for that, my code is very long. I am working to produce a very simple working example. I will get back to you very soon.
Thanks

sshkhr · July 29, 2021, 3:44pm

If I want to use this approach to conditional computation in the “central” part of my model it will lead to an autodiff issue.

The out tensor here is a Leaf node. In my case, the tensor where I split my input batch and then combine them is an intermediate node in the computational graph, so I am performing further operations on it. Hence I initialize it with a requires_grad=true flag.

As a result, when I try to use index_add_ it throws a RuntimeError:

a leaf Variable that requires grad is being used in an in-place operation.

What would be a workaround for it? This thread suggests making a clone or editing the data object of the variable directly: Leaf variable was used in an inplace operation

Would love to hear either of your feedbacks too @jpeg729 or @rahul since this thread has been around for quite a while!

Thanks in advance

EDIT: Nevermind I used torch.Tensor.index_add (the out of place version). However, I am still facing issues with training as my loss does not seem to be changing. So I suspect backprop is not working right.

albanD · July 29, 2021, 5:35pm

Hi,

It is completely ok to have out not require grad and then modify it inplace with something that does. It will make out require gradients and the gradients will propagate as expected

Note that if you are not sure if backprop is working properly for a function (and the function is small enough), you can always try to use gradcheck.
Be aware though that you must use double precision, you function must be smooth at the point where you’re checking the gradients and that if your function is too big, the numerical evaluation of the gradient might be too imprecise and the test might fail for no reason.

sshkhr · July 29, 2021, 7:44pm

Thanks for your feedback. In your example nn2_ind and nn2_ind will have requires_grad = False and as far as I can tell these tensors cannot be set to have requires_grad = True Could that potentially be leading to issues with autodiff?