Pytorch for loop slow down

a = torch.rand(2000,3000).cuda()
time_start = time.time()
for i in range(2048):
max_index = torch.argmax(a)
print(“run %.2f” % (time.time() - time_start))

the top 1024 index in for loop return result quickly, but the latter is slower,
how to fix this problem?

Hi,

This is because the cuda api is asynchronous. So the first ones just launch job onto the GPU. But once the queue of tasks is full, it has to wait. And so it slows down.
If you want correct timing, you need to do:

a = torch.rand(2000,3000).cuda()
torch.cuda.synchronize()
time_start = time.time()
for i in range(2048):
    max_index = torch.argmax(a)
    torch.cuda.synchronize()
    print(“run %.2f” % (time.time() - time_start))