Efficient implementation of Jacobian of Softmax

Given a one-dimensional input tensor S, I want to evaluate the following expression:

J_ij = S_i(delta_ij - S_j)

where delta_ij represents the Kronecker delta. The result J of this expression is a square matrix. I would like to know how I can efficiently evaluate this expression using PyTorch? My current implementation is very slow and looks as follows:

S = torch.arange(start=1, end=10)
I = torch.eye(S.size()[0])
S = S.repeat(S.size()[0], 1)
J = S.T * (I - S)

I would be very happy if someone could give me a hint.

Hi Samuel!

The following expression is more compact and likely faster:

J = torch.diag (S) - torch.outer (S, S)

Best.

K. Frank

Does this also work for batched versions of S? For example, if S has dimensions (batch_size, num_outputs)? Is the result correct if I use

J = torch.diag_embed(S) - torch.outer(S, S)

Hi Samuel!

No. If you had tried it, you would have discovered that torch.outer()
does not accept multidimensional tensors.

No, this will throw an error (because you pass a multidimensional
tensor to torch.outer()).

You can, however, use pytorch’s swiss army knife of tensor
multiplication functions to construct a batch version of outer:

>>> import torch
>>> torch.__version__
'1.9.0'
>>> S = torch.arange (6).reshape (2, 3).float()
>>> S
tensor([[0., 1., 2.],
        [3., 4., 5.]])
>>> torch.diag_embed (S) - torch.einsum ('ij, ik -> ijk', S, S)
tensor([[[  0.,   0.,   0.],
         [  0.,   0.,  -2.],
         [  0.,  -2.,  -2.]],

        [[ -6., -12., -15.],
         [-12., -12., -20.],
         [-15., -20., -20.]]])

(As an aside, none of this has anything to do with the title you gave
this thread, namely “Jacobian of Softmax.”)

Best.

K. Frank

1 Like