What do the dimensions of the output of torch.autograd.functional.jacobian represent?

Hello! I am trying to get the Jacobian of the LeNet5 model using ‘torch.autograd.functional.jacobian’. However, I do not get what the dimensions of the output represent.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

class LeNet5(nn.Module):

    def __init__(self, n_classes):
        super(LeNet5, self).__init__()
        
        self.feature_extractor = nn.Sequential(            
            nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2),
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2),

        )

        self.classifier = nn.Sequential(
            nn.Linear(in_features = 400, out_features = 120),
            nn.Tanh(),
            nn.Linear(in_features=120, out_features=84),
            nn.Tanh(),
            nn.Linear(in_features=84, out_features=n_classes),
        )


    def forward(self, x):
        x = self.feature_extractor(x)
        x = torch.flatten(x, 1)
        logits = self.classifier(x)
        probs = F.softmax(logits, dim=1)
        return logits

I used this tensor as a test input, with dim = (batch size, input channel, width, height)

inputs = torch.rand(size=(12, 1, 32,32))

When I use this to get the Jacobian

torch.autograd.functional.jacobian(model, inputs, create_graph=False, strict=False, vectorize=True)

I get a tensor with size = (12, 10, 12, 1, 32, 32), where the 1st and 3rd dimensions (12 and 12) follow the outermost dimension of the input (the batch size). Why are there are two 12s in the dimension of the Jacobian? Thank you very much!

torch.autograd.functional.jacobian differentiates with respect to the entire input Tensor so the batch dimension will be duplicated within the output. For example, if you had an input for an arbitrary function with the shape [B,N] you’d get a Jacobian of [B,N,B,N] (assuming the function has a single output of course) as PyTorch just concatenates the input shape.

If your samples are independent along the batch dimension then you can remove the copying of the batch dimension via torch.einsum. For example, with the example above if you had a jacobian of shape [B,N,B,N] you could remove the 2nd B dimension via,

new_jacobian = torch.einsum("bibj->bij", jacobian)
1 Like