Hi, all:

these days, I got a problem: there is different speed to value the tensor by index on different pytorch platforms (Pytorch 0.2 and Pytorch 0.4).

This is test code:

```
import torch
import time
import numpy as np
def main():
#tensor_a = (torch.rand(20,100,100)).cuda()
#tensor_b = torch.rand(20,100,100).cuda()
#np.save('tensor_a.npy', tensor_a.cpu().numpy())
#np.save('tensor_b.npy', tensor_b.cpu().numpy())
tensor_a = torch.from_numpy(np.load('tensor_a.npy', encoding="latin1")).cuda()
tensor_b = torch.from_numpy(np.load('tensor_b.npy', encoding="latin1")).cuda()
torch.cuda.synchronize()
end = time.time()
for i in range(100):
tensor_b[tensor_a <= 0.5] = 0
torch.cuda.synchronize()
print('run time is:', time.time() - end)
if __name__ == '__main__':
main()
```

This is the speed:

Pytorch 0.2: total time is 0.0015s

Pytorch 0.4: total time is 0.17s

What does cause the speed decline on Pytorch 0.4? And how to solve the speed problem on Pytorch 0.4?

Thanks