I’m confused about index put operation when given indices are duplicated. Here is a code snippet.
import torch
torch.manual_seed(0)
mapping = torch.randn((3,3))
a = torch.ones((3), requires_grad=True)
index = torch.tensor([0,1,1,1])
# Version 1
b = a @ mapping
b[index] += 1
print("index put with duplicated indices", b)
loss = b.sum()
loss.backward()
print("a.grad", a.grad)
# Version 2
a.grad = None
b = a @ mapping
b[torch.unique(index)] += 1
print("index put with unique indices", b)
loss = b.sum()
loss.backward()
print("a.grad", a.grad)
The outputs are:
index put with duplicated indices tensor([ 3.5128, 0.4601, -4.2966], grad_fn=<IndexPutBackward0>)
a.grad tensor([-1.5181, -4.0837, 2.1982])
index put with unique indices tensor([ 3.5128, 0.4601, -4.2966], grad_fn=<IndexPutBackward0>)
a.grad tensor([-0.9312, -1.9147, 0.5221])
So the outputs of these two index put operations are identical, while the gradients are different. I’m confused about this phenomenon.