firstly enjoy the christmas holidays. Secondly i have a question regarding the torch function
torch.index_put. As an input it receives indices (tuple of LongTensor) and i am wondering whether this function is supported on the GPU or does it move all the involved data to the RAM and use the CPU. I think a tuple is a python object and therefore not available on the GPU, isn’t it? It is also a little bit confusing, at least for me, why indices is a tuple and not just a
torch.long. A explanation would be that if the option
accumulate=True one has to use something like an
atomic add function, so the function itself is not suitable for GPU support, but i don’t know whether this is true.
import torch as tr import time device=tr.device('cuda:0') #device=tr.device('cpu') N = 10000 repeat = 10 x = tr.ones(N, 1, device=device) y = tr.zeros(x.shape, device=device) idx = tr.zeros(1, N, dtype=tr.long, device=device) z = tr.zeros(1,1, device=device) start = tr.cuda.Event(enable_timing=True) end = tr.cuda.Event(enable_timing=True) start.record() #start = time.time() for i in range(repeat): z = tr.index_put(z, tuple(idx), x, accumulate=True) #end = time.time() end.record() # Waits for everything to finish running tr.cuda.synchronize() print(start.elapsed_time(end)/repeat) #print((end-start)/repeat) print(z)
GPU is much slower than CPU
Hopefully someone can help,