Torch.index_put on GPU with accumulate=True

I’ve seen two posts about index_put on GPU: here and here. They are both marked as solved, but it’s not working for me. It’s fine on the CPU, and on GPU with accumulate=False, but when accumulate=True I get

RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == value.numel()INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1639180588308/work/aten/src/ATen/native/cuda/":250, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor251

Steps to reproduce:

t1 = torch.zeros(10).long().cuda()
t2 = torch.randint(10, size=(25,)).cuda()
t1.index_put(indices=[t2], values=torch.tensor(1), accumulate=True)

Pytorch version: 1.10.1

Could you update to the latest nightly as I cannot reproduce the issue, please?

Hi, yes I just tried and it works fine on nightly

t1 = torch.zeros(10).cuda()
t2 = torch.randint(10, size=(25,)).cuda()
print("T2 is {0}".format(t2))
t1.index_put(indices=[t2], values=torch.ones(25).type(torch.FloatTensor).cuda(), accumulate=True)

The use case is that I want to count the number of occurrences of each number from 1 to 10 in the size 25 tensor t2. It’s a bit counterintuitive to have to allocate a new tensor of 25 ones to do that, but this does work, so thanks.