I want to accelerate multiplication of two numpy matrix X and Y on GPU, such as:
index=np.array([1,3,5,7])
X[index,:]*Y[index,:]
First, I transform torch.from_numpy(X).cuda(), torch.from_numpy(Y).cuda(),
But should the index of matrix also be transformed to torch.tensor like: torch.from_numpy(index).cuda() ?

Note that I didnot encounter an error if not transformed, but I am not sure whether the acceleration will be influenced.

Looking at the indexing code, it moves the index to the same device as the indexed tensor lives in. As such it would appear be more efficient to do it in advance.

You could always time it (but donâ€™t forget to call torch.cuda.synchronize() before you take the start/finish time), too. Indexing should be a bit faster if you have the index tensor on the GPU.