I have a 3D tensors of size *N* x *M* x *C* with *N* and *M* very large. I want to increment various elements in the tensor. For the first two dimensions, the indices to be updated are stored in two separate 1D tensors (potentially of different length). As a simple example:

```
x = torch.zeros([5, 6, 2])
idx1 = torch.tensor([3, 2], dtype=torch.long)
idx2 = torch.tensor([0, 1, 2], dtype=torch.long)
x[idx1, idx2, 0] += 1
```

Ideally, this would update the elements in `x`

at indices (3,0,0), (3,1,0), (3,2,0), (2,0,0), etc. However the above code raises the error:

`IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [2], [3]`

A similar question was previously asked and that solution works if `x`

is only accessed and not modified (i.e., call `x[idx1][:,idx2,0]`

). N and M are large enough that calling `torch.cartesian_prod`

to generate the indices yields out of memory errors.

In low-level CUDA, this feels reasonably straightforward and memory efficient to do, but I could not figure out the right Python syntax to make it work at all much less efficiently.