Different speed to value the tensor by index on different Pytorch versions (0.2 and 0.4)

Hi, all:

these days, I got a problem: there is different speed to value the tensor by index on different pytorch platforms (Pytorch 0.2 and Pytorch 0.4).

This is test code:

import torch
import time
import numpy as np
def main():
    #tensor_a = (torch.rand(20,100,100)).cuda()
    #tensor_b = torch.rand(20,100,100).cuda()
    #np.save('tensor_a.npy', tensor_a.cpu().numpy())
    #np.save('tensor_b.npy', tensor_b.cpu().numpy())
    tensor_a = torch.from_numpy(np.load('tensor_a.npy', encoding="latin1")).cuda()
    tensor_b = torch.from_numpy(np.load('tensor_b.npy', encoding="latin1")).cuda()
    torch.cuda.synchronize()
    end = time.time()
    for i in range(100):
        tensor_b[tensor_a <= 0.5] = 0
    torch.cuda.synchronize()
    print('run time is:', time.time() - end)
if __name__ == '__main__':
    main()

This is the speed:
Pytorch 0.2: total time is 0.0015s
Pytorch 0.4: total time is 0.17s

What does cause the speed decline on Pytorch 0.4? And how to solve the speed problem on Pytorch 0.4?

Thanks

I assume you are using the shapes [20, 100, 100] for your tests?
Could you also time the code using the latest PyTorch version (1.1.0)?

Also, which GPU, CUDA and cuDNN version are you using?

the GPU is Geforce GTX 1080 Ti, cuda is cuda9.0, cuDNN is 7.0; I haven’t tried the pytorch1.1, but both pytorch 0.2 and 0.3 is ok.