First time access torch.tensor too slow (strange thing)

Update : My question was too long, so I was finding a way to explain it shortly:D, ty for accept my question

Hi, I was training my model but I noticed something strange, I rewrited my code for easy look, look at these code below :

matched = torch.LongTensor(8732).to("cuda")
for epoch in range(config.start_epoch, config.end_epoch):
    for imgs, targets in dataloader:
        
        imgs = imgs.to(config.device)
        targets = [target.to(config.device) for target in targets]

        with torch.set_grad_enabled(False):
            loc, conf = net(imgs)  

            for i in range(5):
                t = time.time()
                matched[0] = -1
                print(time.time() - t)

        print("finished 1 iteration!")

I created matched tensor and pass image into my neural network, after that, I measure how much time it take to assign just 1 element in matched :

finished 1 iteration!
0.20601940155029297
0.0
0.0
0.0
0.006075859069824219
finished 1 iteration!
0.21062469482421875
0.0
0.0
0.0
0.0
finished 1 iteration!

You can see that the first time I assign matched[0] take so much time, but when I don’t pass images into neural network, it take fewer time :

matched = torch.LongTensor(8732).to("cuda")
for epoch in range(config.start_epoch, config.end_epoch):
    for imgs, targets in dataloader:
        
        imgs = imgs.to(config.device)
        targets = [target.to(config.device) for target in targets]

        with torch.set_grad_enabled(False):
            #loc, conf = net(imgs)  not pass images in to network

            for i in range(5):
                t = time.time()
                matched[0] = -1
                print(time.time() - t)

        print("finished 1 iteration!")
   

the result is :

0.0
0.0
0.0010001659393310547
0.0
0.0
finished 1 iteration!
0.0
0.0
0.0
0.0
0.0
finished 1 iteration!

I don’t understand why, please let me know the reason

// config.device = “cuda” in my code

CUDA operations are executed asynchronously so you would need to synchronize the code via torch.cuda.synchronize() before starting and stopping the timers. Otherwise you would measure the dispatching, kernel launches, and implicit synchronizations in your code.

2 Likes

It has been very helpful to me, thank you