Broadcast Errors with Multidimensional Masks

I have a 3D tensors of size N x M x C with N and M very large. I want to increment various elements in the tensor. For the first two dimensions, the indices to be updated are stored in two separate 1D tensors (potentially of different length). As a simple example:

x = torch.zeros([5, 6, 2])
idx1 = torch.tensor([3, 2], dtype=torch.long)
idx2 = torch.tensor([0, 1, 2], dtype=torch.long)
x[idx1, idx2, 0] += 1

Ideally, this would update the elements in x at indices (3,0,0), (3,1,0), (3,2,0), (2,0,0), etc. However the above code raises the error:

IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [2], [3]

A similar question was previously asked and that solution works if x is only accessed and not modified (i.e., call x[idx1][:,idx2,0]). N and M are large enough that calling torch.cartesian_prod to generate the indices yields out of memory errors.

In low-level CUDA, this feels reasonably straightforward and memory efficient to do, but I could not figure out the right Python syntax to make it work at all much less efficiently.