Hi,
does the C++ API have a function which is equivalent to the torch.autograd.functional.jacobian function from Pytorch? If there is, I cannot find it.
Best,
Simon
Hi,
does the C++ API have a function which is equivalent to the torch.autograd.functional.jacobian function from Pytorch? If there is, I cannot find it.
Best,
Simon
So if we wanted the jacobian, we’d have to call torch::autograd::grad right?
I’ve been playing around with what this looks like. This is what I have so far, does this seem correct?
#include <torch/script.h> // One-stop header.
#include <torch/torch.h>
#include <iostream>
#include <memory>
int main() {
// model with 4 input dimmensions
// and 10 output classes
auto model = torch::nn::Linear(4, 10);
// sample imput
auto input = torch::ones({1, 4}).requires_grad_(true);
// forward pass in input
auto out = model(input);
auto grad_output = torch::ones({/*batch=*/1, 1}).to(out);
// compute the jacobian of out wrt input
for (int i = 0; i < 10; i++) // 10 loops for 10 output classes
{
auto gradient = torch::autograd::grad({out.slice(/*dim=*/1, /*start=*/i,
/*end=*/i+1)},
{input},
/*grad_outputs=*/{grad_output},
/*retain_graph=*/true,
/*create_graph=*/true);
std::cout << i << '\n';
std::cout << gradient[0] << '\n';
// print the gradient of out[:][i] wrt input
}
return 0;
}
EDIT*: I’ve changed auto grad_output = torch::ones_like(out);
to auto grad_output = torch::ones({/*batch=*/1, 1}).to(out);
because I noticed my gradients were off by a factor of 10…
I played a bit with your code. You can also feed the full output to the grad function and use a different grad_ouput for each iteration. This seems to be faster for larger sizes, but in practice there shouldn’t be much of a different. Also don’t know if the simple time measurement is a fair comparison.
#include <torch/torch.h>
#include <iostream>
#include <chrono>
// define sizes
unsigned int batch_size = 100;
unsigned int input_dim = 100;
int output_dim = 1000;
torch::Tensor model(const torch::Tensor& input)
{
unsigned int input_dim = input.sizes().back();
auto M = torch::linspace(0, 1, input_dim*output_dim).view({input_dim, output_dim});
return torch::exp(input.matmul(M));
}
int main() {
// model (uncomment if the function above should be used)
//auto model = torch::nn::Linear(input_dim, output_dim);
// sample input
auto input = torch::linspace(0, 1, batch_size*input_dim).view({batch_size, input_dim}).requires_grad_(true);
std::cout << "input size: " << input.sizes() << std::endl;
// forward pass in input
auto out = model(input);
std::cout << "output size: " << out.sizes() << std::endl;
// preallocate jacobian
auto J = torch::ones({batch_size, output_dim, input_dim});
// preallocate time variables
auto start = std::chrono::system_clock::now();
auto end = std::chrono::system_clock::now();
auto elapsed = end - start;
// compute the jacobian of out wrt input
start = std::chrono::system_clock::now();
for (int i = 0; i < output_dim; i++)
{
auto grad_output = torch::zeros({output_dim});
grad_output.index_put_({i}, 1);
grad_output = grad_output.expand({batch_size, output_dim});
//print(grad_output);
//std::cout << "grad_output size: " << grad_output.sizes() << std::endl;
auto gradient = torch::autograd::grad({out},
{input},
/*grad_outputs=*/{grad_output},
/*retain_graph=*/true,
/*create_graph=*/true);
auto grad = gradient[0].unsqueeze(/*dim=*/1);
//std::cout << "grad size: " << grad.sizes() << std::endl;
J.slice(1, i, i+1) = grad;
}
end = std::chrono::system_clock::now();
elapsed = end - start;
std::cout << "Jacobian: " << std::endl;
//std::cout << J << std::endl;
std::cout << "duration: " << elapsed.count() << std::endl;
// compute the jacobian of out wrt input
start = std::chrono::system_clock::now();
for (int i = 0; i < output_dim; i++)
{
auto grad_output = torch::ones({1});
grad_output = grad_output.expand({batch_size, 1});
//print(grad_output);
//std::cout << "grad_output size: " << grad_output.sizes() << std::endl;
auto gradient = torch::autograd::grad({out.slice(/*dim=*/1, /*start=*/i,
/*end=*/i+1)},
{input},
/*grad_outputs=*/{grad_output},
/*retain_graph=*/true,
/*create_graph=*/true);
auto grad = gradient[0].unsqueeze(/*dim=*/1);
//std::cout << "grad size: " << grad.sizes() << std::endl;
J.slice(1, i, i+1) = grad;
}
end = std::chrono::system_clock::now();
elapsed = end - start;
std::cout << "Jacobian: " << std::endl;
//std::cout << J << std::endl;
std::cout << "duration: " << elapsed.count() << std::endl;
return 0;
}
@cjekel
You don’'t need .to(out), torch::ones(1, 1) makes the grad output tensor size to 1 with initial value 1, that’s enough. your original torch::ones_like(out) will end up with a tensor with ten 1s, so your calculated gradients will be sum up which will be 10 times bigger.
Whats the speed of doing this iteratively like this?
Hi, are there any update with the “torch.autograd.functional.jacobian” in C++ libtorch?
We are developing a nonlinear algorithm which needs this operation, hope it can be done and called by users.