How to correctly apply softmax in some specific tensor element?

wwwangzhch · September 9, 2019, 2:38pm

For example, if I have a tensor a=[0.4, 0.1, 0.2, 0.1, 0.5], I want to apply softmax in top 3 elements, and let other elements become 0, for tensor a, that is [0.3420, 0.0, 0.2800, 0.0000, 0.3780], but when I use pytorch to implement this method I will get this error message in loss.backward() “RuntimeError: leaf variable has been moved into the graph interior”

Here is my code:

def new_softmax(raw, k):
    values, idxs = raw.topk(k)
    all = sum(torch.exp(values))
    for i in idxs:
        raw[i] = torch.exp(raw[i]) / all
    one_hots = [int(j in idxs) for j in range(len(raw))]
    return raw * torch.tensor(one_hots).float()
a = torch.tensor([0.4, 0.1, 0.2, 0.1, 0.5], requires_grad=True)
b = new_softmax(a, 3)
sum(b).backward()

albanD · September 9, 2019, 10:38pm

Hi,

This is because you modify the input of your Function inplace here and since it’s a Tensor that you created and that requires gradients (we call these leaf Tensors), this is not allowed.
You can add at the beginning of your function raw = raw.clone() to solve that problem.
Also, since you set to 0 all the values that should be to 0 at the end of your function, I think it will be more efficient to apply a full softmax instead of the for loop.

wwwangzhch · September 10, 2019, 2:24am

Hi, albanD:

Thanks for your reply. I followed your instruction and it works! but I didn’t understand what does the sentence “Also, since you set to 0 all the values that should be to 0 at the end of your function, I think it will be more efficient to apply a full softmax instead of the for loop.” mean? Do you mean I could directly apply nn.Softmax() to get what I want after I multiply raw by one_hots?

Finally, thank you again for your kind help.

albanD · September 10, 2019, 1:58pm

Do you mean I could directly apply nn.Softmax() to get what I want after I multiply raw by one_hots?

From what I understand of your code yes.
The only point of the for loop compared to a full Softmax() is not to modify the values that you don’t want to compute the softmax for right? If so, you can actually change these values to anything given that you set them to 0 later.
Also using a regular softmax will remove the inplace assignment raw[i] = xxx and so you won’t need the clone() anymore.

wwwangzhch · September 11, 2019, 1:25pm

To my understanding, I think these two methods are different.
For example, we have a tensor a = tensor([0.0000, 0.5000, 0.0000, 0.0000, 0.7000]), if I only want the top 2 softmax result for this tensor, the result should be tensor([0.0000, 0.4502, 0.0000, 0.0000, 0.5498]), but if I apply nn.Softmax() first and set the values I don’t want to 0, the calculation procedure is :
softmax([0.0000, 0.5000, 0.0000, 0.0000, 0.7000]) =tensor([0.1501, 0.2475, 0.1501, 0.1501, 0.3023])->tensor([0.0, 0.2475, 0.0, 0.0, 0.3023])

If I am wrong, please correct me
Thank you again!

albanD · September 17, 2019, 2:28pm

Ho sorry, I missed the part where you compute the sum only on the topk result. So yes they are different indeed !