Should index of matrix be transformed for GPU acceleration?

I want to accelerate multiplication of two numpy matrix X and Y on GPU, such as:
index=np.array([1,3,5,7])
X[index,:]*Y[index,:]
First, I transform torch.from_numpy(X).cuda(), torch.from_numpy(Y).cuda(),
But should the index of matrix also be transformed to torch.tensor like: torch.from_numpy(index).cuda() ?

Note that I didnot encounter an error if not transformed, but I am not sure whether the acceleration will be influenced.

docscippy seems to have just the thing

Looking at the indexing code, it moves the index to the same device as the indexed tensor lives in. As such it would appear be more efficient to do it in advance.

You could always time it (but don’t forget to call torch.cuda.synchronize() before you take the start/finish time), too. Indexing should be a bit faster if you have the index tensor on the GPU.

Best regards

Thomas