I’ve seen two posts about index_put on GPU: here and here. They are both marked as solved, but it’s not working for me. It’s fine on the CPU, and on GPU with accumulate=False, but when accumulate=True I get
RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == value.numel()INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1639180588308/work/aten/src/ATen/native/cuda/Indexing.cu":250, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor251
Steps to reproduce:
t1 = torch.zeros(10).long().cuda()
t2 = torch.randint(10, size=(25,)).cuda()
t1.index_put(indices=[t2], values=torch.tensor(1), accumulate=True)
Pytorch version: 1.10.1