I am hoping to get Jacobians in a way that respects the batch, efficiently
Given a batch of b (vector) predictions y_1,…,y_b, and inputs x_1 … x_b, I want to compute the Jacobians of y_i wrt x_i. In other words, I want a Jacobian of the output wrt input for each pair in the batch.
One might try the following:
import torch import torch.nn as nn # Load the experimental api # https://github.com/pytorch/pytorch/blob/master/torch/autograd/functional.py from experimental_api import jacobian in_dim = 5 batch_size = 3 hidden_dim = 2 out_dim = 10 f = nn.Sequential(nn.Linear(in_dim, hidden_dim), nn.Sigmoid(), nn.Linear(hidden_dim, out_dim)) x = torch.randn(batch_size, in_dim) result = jacobian(f, x) result.shape
torch.Size([3, 10, 3, 5])
In this case, I wanted a
result with shape
(batch_size, out_dim, in_dim) = (3, 10, 5) but instead got extra dimensions. In fact, PyTorch is considering
x as just one input matrix rather than a batch of several vectors. This leads to redundant calculation such as the derivatives of target 2 wrt input 1.
Clearly, I could do a loop like
jacobian(f, x_row) for x_row in x but that would no longer use the GPU effectively. Can anyone propose an efficient solution?