# Efficient implementation of Jacobian of Softmax

Given a one-dimensional input tensor `S`, I want to evaluate the following expression:

`J_ij = S_i(delta_ij - S_j)`

where `delta_ij` represents the Kronecker delta. The result `J` of this expression is a square matrix. I would like to know how I can efficiently evaluate this expression using PyTorch? My current implementation is very slow and looks as follows:

``````S = torch.arange(start=1, end=10)
I = torch.eye(S.size()[0])
S = S.repeat(S.size()[0], 1)
J = S.T * (I - S)
``````

I would be very happy if someone could give me a hint.

Hi Samuel!

The following expression is more compact and likely faster:

``````J = torch.diag (S) - torch.outer (S, S)
``````

Best.

K. Frank

Does this also work for batched versions of `S`? For example, if `S` has dimensions `(batch_size, num_outputs)`? Is the result correct if I use

`J = torch.diag_embed(S) - torch.outer(S, S)`

Hi Samuel!

No. If you had tried it, you would have discovered that `torch.outer()`
does not accept multidimensional tensors.

No, this will throw an error (because you pass a multidimensional
tensor to `torch.outer()`).

You can, however, use pytorchâ€™s swiss army knife of tensor
multiplication functions to construct a batch version of outer:

``````>>> import torch
>>> torch.__version__
'1.9.0'
>>> S = torch.arange (6).reshape (2, 3).float()
>>> S
tensor([[0., 1., 2.],
[3., 4., 5.]])
>>> torch.diag_embed (S) - torch.einsum ('ij, ik -> ijk', S, S)
tensor([[[  0.,   0.,   0.],
[  0.,   0.,  -2.],
[  0.,  -2.,  -2.]],

[[ -6., -12., -15.],
[-12., -12., -20.],
[-15., -20., -20.]]])
``````

(As an aside, none of this has anything to do with the title you gave
this thread, namely â€śJacobian of Softmax.â€ť)

Best.

K. Frank

1 Like