Hi everyone,
I have a model trained in Pytorch, which has been serialized and imported in C++ for inference.
Let’s say the NN has n_in inputs and n_out outputs. For every sample, I would like to access the derivative of all the outputs with respect to the inputs. What I am doing is the following:

//define input
torch::Tensor x = torch::rand( {1,n_in} );
x.set_requires_grad(true);
//convert to Ivalue
std::vector<torch::jit::IValue> inputs;
inputs.push_back( x );
//calculate output
torch::Tensor output = model.forward( inputs ).toTensor();
//calculate derivatives of every output j with respect to x
for(unsigned j=0; j<n_out; j++){
output[0][j].backward(); // 0 because is a batch of size 1
torch::Tensor der_j = x.grad()
/*
...
*/
}

The problem is that in this way the gradients are accumulated into x.grad at each call of output[0][j].backward(). Since I am not using an optimizer I cannot call zero_grad() on it. Should I access the gradient of x after each backward call and manually set to zero? Or are there better ways of doing this?

I was wondering since the C++ API is pretty close to the python one.
Could you try to use the autograd.grad function as described here?

The gradient will be accumulated in the input_vector instead of in your case x.
The input_vector is usually set to “ones” with the shape of the output_vector.

Another option I figured out is by following the C++ documentation, where is described how Autograd works. However, the example reported there is not working since the variable c is not a scalar:

#include <torch/csrc/autograd/variable.h>
#include <torch/csrc/autograd/function.h>
torch::Tensor a = torch::ones({2, 2}, torch::requires_grad());
torch::Tensor b = torch::randn({2, 2});
auto c = a + b;
c.backward(); // a.grad() will now hold the gradient of c w.r.t. a.

In order to make this work, one has to call the backward function on each component,e.g.:

c[0][0].backward()

and then reset the gradients to obtain with a subsequent backward call the gradients w.r.t. a different output component:

a.grad().zero_();

I think the documentation should be updated to deal with the fact that backward on a non-scalar variable is not allowed anymore in LibTorch, since version 1.3 I think, as it was already in the Python version.

I’m glad you figured it out by yourself @lui.bonati.
So backward default settings is for a single value tensor.

c.backward() == c.backward(torch.ones(1))

If you want to apply it to a vector you might want to pass a parameter in your backward function.
Let’s say x is a tensor correctly setup and with the data you want to use.

ones = torch.ones_like(x) # create a tensor full of one with the shape of x
x.backward(ones)

The ones vector is actually the variable w.r.t you differentiate. So it can be anything else than this.