Pytorch for loop slow down

ywcmaike · November 21, 2019, 9:06am

a = torch.rand(2000,3000).cuda()
time_start = time.time()
for i in range(2048):
max_index = torch.argmax(a)
print(“run %.2f” % (time.time() - time_start))

the top 1024 index in for loop return result quickly， but the latter is slower，
how to fix this problem?

albanD · November 21, 2019, 3:20pm

Hi,

This is because the cuda api is asynchronous. So the first ones just launch job onto the GPU. But once the queue of tasks is full, it has to wait. And so it slows down.
If you want correct timing, you need to do:

a = torch.rand(2000,3000).cuda()
torch.cuda.synchronize()
time_start = time.time()
for i in range(2048):
    max_index = torch.argmax(a)
    torch.cuda.synchronize()
    print(“run %.2f” % (time.time() - time_start))