Torchvision ToTensor MUCH slower than manually normalizing

IsCoelacanth · July 8, 2021, 4:18pm

I’ve been seeing VERY high CPU utilization when using ToTensor in my transforms and its easily the slowest operation in my transformation compositions.
system:
torch: 1.8.1
torchvision: 0.9.1

been facing the same issue since 1.4.0, even tried building from source (tried 1.6.0 and 1.8.0)

def to_tensor(img): 
        img = np.array(img) 
        img = torch.from_numpy(img).float().permute(2,0,1) 
        img = img / 255.0 
        return img

In [12]: %%timeit 
    ...: to_tensor(img)                                                                                                                                                                                                                                                          
826 us +- 43.3 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)

vs torchvision

    ...: tfunc.to_tensor(img)                                                                                                                                                                                                                                                         
1.49 ms +- 73.5 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)

img is a 512x512 RGB PIL Image.
any particular reason this may be happening?

ptrblck · July 13, 2021, 5:29am

Besides the additional checks for numpy array inputs, number of channels etc. in the torchvision implementations, you are not creating a contiguous output (TF.to_tensor does it here) and would also see a performance penalty due to the needed copy after the permute operation.
You can check it via:

out = to_tensor(img)
print(out.is_contiguous())
> False

Next operations, which might require a contiguous input, would then trigger the copy and you would slow them down instead.