How to correctly apply softmax in some specific tensor element?

For example, if I have a tensor a=[0.4, 0.1, 0.2, 0.1, 0.5], I want to apply softmax in top 3 elements, and let other elements become 0, for tensor a, that is [0.3420, 0.0, 0.2800, 0.0000, 0.3780], but when I use pytorch to implement this method I will get this error message in loss.backward() “RuntimeError: leaf variable has been moved into the graph interior”

Here is my code:

def new_softmax(raw, k):
    values, idxs = raw.topk(k)
    all = sum(torch.exp(values))
    for i in idxs:
        raw[i] = torch.exp(raw[i]) / all
    one_hots = [int(j in idxs) for j in range(len(raw))]
    return raw * torch.tensor(one_hots).float()
a = torch.tensor([0.4, 0.1, 0.2, 0.1, 0.5], requires_grad=True)
b = new_softmax(a, 3)
sum(b).backward()

Hi,

This is because you modify the input of your Function inplace here and since it’s a Tensor that you created and that requires gradients (we call these leaf Tensors), this is not allowed.
You can add at the beginning of your function raw = raw.clone() to solve that problem.
Also, since you set to 0 all the values that should be to 0 at the end of your function, I think it will be more efficient to apply a full softmax instead of the for loop.

Hi, albanD:

Thanks for your reply. I followed your instruction and it works! but I didn’t understand what does the sentence “Also, since you set to 0 all the values that should be to 0 at the end of your function, I think it will be more efficient to apply a full softmax instead of the for loop.” mean? Do you mean I could directly apply nn.Softmax() to get what I want after I multiply raw by one_hots?

Finally, thank you again for your kind help.

Do you mean I could directly apply nn.Softmax() to get what I want after I multiply raw by one_hots?

From what I understand of your code yes.
The only point of the for loop compared to a full Softmax() is not to modify the values that you don’t want to compute the softmax for right? If so, you can actually change these values to anything given that you set them to 0 later.
Also using a regular softmax will remove the inplace assignment raw[i] = xxx and so you won’t need the clone() anymore.

To my understanding, I think these two methods are different.
For example, we have a tensor a = tensor([0.0000, 0.5000, 0.0000, 0.0000, 0.7000]), if I only want the top 2 softmax result for this tensor, the result should be tensor([0.0000, 0.4502, 0.0000, 0.0000, 0.5498]), but if I apply nn.Softmax() first and set the values I don’t want to 0, the calculation procedure is :
softmax([0.0000, 0.5000, 0.0000, 0.0000, 0.7000]) =tensor([0.1501, 0.2475, 0.1501, 0.1501, 0.3023])->tensor([0.0, 0.2475, 0.0, 0.0, 0.3023])

If I am wrong, please correct me
Thank you again!:grinning:

Ho sorry, I missed the part where you compute the sum only on the topk result. So yes they are different indeed !