I have a 3D tensors of size N x M x C with N and M very large. I want to increment various elements in the tensor. For the first two dimensions, the indices to be updated are stored in two separate 1D tensors (potentially of different length). As a simple example:
x = torch.zeros([5, 6, 2]) idx1 = torch.tensor([3, 2], dtype=torch.long) idx2 = torch.tensor([0, 1, 2], dtype=torch.long) x[idx1, idx2, 0] += 1
Ideally, this would update the elements in
x at indices (3,0,0), (3,1,0), (3,2,0), (2,0,0), etc. However the above code raises the error:
IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes , 
A similar question was previously asked and that solution works if
x is only accessed and not modified (i.e., call
x[idx1][:,idx2,0]). N and M are large enough that calling
torch.cartesian_prod to generate the indices yields out of memory errors.
In low-level CUDA, this feels reasonably straightforward and memory efficient to do, but I could not figure out the right Python syntax to make it work at all much less efficiently.