Hello together,
firstly enjoy the christmas holidays. Secondly i have a question regarding the torch function torch.index_put
. As an input it receives indices (tuple of LongTensor) and i am wondering whether this function is supported on the GPU or does it move all the involved data to the RAM and use the CPU. I think a tuple is a python object and therefore not available on the GPU, isn’t it? It is also a little bit confusing, at least for me, why indices is a tuple and not just a torch.tensor
with torch.dtype
torch.long
. A explanation would be that if the option accumulate=True
one has to use something like an atomic add
function, so the function itself is not suitable for GPU support, but i don’t know whether this is true.
Minimal example:
import torch as tr
import time
device=tr.device('cuda:0')
#device=tr.device('cpu')
N = 10000
repeat = 10
x = tr.ones(N, 1, device=device)
y = tr.zeros(x.shape, device=device)
idx = tr.zeros(1, N, dtype=tr.long, device=device)
z = tr.zeros(1,1, device=device)
start = tr.cuda.Event(enable_timing=True)
end = tr.cuda.Event(enable_timing=True)
start.record()
#start = time.time()
for i in range(repeat):
z = tr.index_put(z, tuple(idx), x, accumulate=True)
#end = time.time()
end.record()
# Waits for everything to finish running
tr.cuda.synchronize()
print(start.elapsed_time(end)/repeat)
#print((end-start)/repeat)
print(z)
GPU is much slower than CPU
Hopefully someone can help,
greetings