The speed of tensor to device

auroua · July 14, 2019, 10:24am

I am running two different segmentation models. The input have the same size and data type. But the speed of the following code is quite different.
What caused the difference.
Model 1 code

            # images  torch.float32 cpu False torch.Size([2, 3, 480, 480]) 
           # targets torch.int64 cpu False torch.Size([2, 480, 480])
            end2 = time.time()
            images = images.to(self.device)
            targets = targets.to(self.device)
            data_to_device = time.time() - end2

The data_to_device of this model is 0.0568695068359375.

Model 2 code

        print(images.dtype, images.device, images.requires_grad, images.size())
        print(target_mask.dtype, target_mask.device,
              target_mask.requires_grad, target_mask.size())
        end2 = time.time()
        images = images.to(device)
        target_mask = target_mask.to(device)
        data_to_device = time.time() - end2
        print('=============', data_to_device)

The output of model 2 from the first iteration is

2019-07-14 18:23:21,835 agfcoo.trainer INFO: Start training
torch.float32 cpu False torch.Size([2, 3, 480, 480])
torch.int64 cpu False torch.Size([2, 480, 480])
============= 0.0012090206146240234
torch.float32 cpu False torch.Size([2, 3, 480, 480])
torch.int64 cpu False torch.Size([2, 480, 480])
============= 0.5653738975524902
torch.float32 cpu False torch.Size([2, 3, 480, 480])
torch.int64 cpu False torch.Size([2, 480, 480])
============= 0.28435635566711426
torch.float32 cpu False torch.Size([2, 3, 480, 480])
torch.int64 cpu False torch.Size([2, 480, 480])
============= 0.26169490814208984
torch.float32 cpu False torch.Size([2, 3, 480, 480])
torch.int64 cpu False torch.Size([2, 480, 480])
============= 0.2824215888977051
torch.float32 cpu False torch.Size([2, 3, 480, 480])
torch.int64 cpu False torch.Size([2, 480, 480])
============= 0.2888615131378174

This computer have one RTX 2060.
The pytorch version is 1.1.
The cuda is 10.0.

What caused the difference?

Luoplus7 · October 22, 2019, 6:23am

Hi, I have the same question with you.
Same data but very different speed with pytorch1.0.
Have you solved it?

ptrblck · October 22, 2019, 9:00am

Since CUDA operations are executed asynchronously, you should synchronize before starting and stopping the timer via torch.cuda.synchronize().

Could you add it and profile the code again?