Indexing, unexpected behavior in cuda

Hi @zweetvoetje,
I raised an issue several months ago which, I think, touches the same problem as you have: Is there any alternative to numpy.add.at in PyTorch?

Likely, coordlist can have duplicate values, so += is ambiguous in this case and can cause undefined behavior on GPU. I would change

tuss[j,:,self.coordlist[:]] += x[j,:,:]

to

torch.index_add_(tuss[j], self.coordlist, x[j])

(I hope I am not mistaken about syntax of index_add_)