How to replace scipy.ndimage.rotate with torchvision

sakura1 · June 18, 2023, 8:46am

I am training a vision model with scipy.ndimage.rotate in cpu-aug. This rotation in the cpu becomes the bottleneck. So I was looking for suggestions to replace it with torchvision. But the resut is diffent from torchvision.transforms.functional.rotate.
My test looks something like this:

import numpy as np
from scipy import ndimage

c = np.zeros([4, 4]).astype("float32")
for i in range(4):
    for j in range(4):
        c[i][j] = 4 * i + j

res1 = ndimage.rotate(c, 60, reshape=False, mode="nearest")
print(res1)

import torchvision
from torchvision.transforms import InterpolationMode
inp = torch.tensor(c, device="cuda:0")
res2 = torchvision.transforms.functional.rotate(inp[None][None], 60, interpolation=InterpolationMode.NEAREST)

print(res2)

The result looks something like this:

res1 :
[[ 2.0601943  3.3288476  7.8078837 11.240777 ]
 [ 1.2019709  4.727933   8.9034605 13.673849 ]
 [ 1.3261505  6.096539  10.272067  13.798029 ]
 [ 3.759223   7.1921163 11.671152  12.939806 ]]

res2:
tensor([[[[ 0.,  3.,  7.,  0.],
          [ 1.,  6., 10., 15.],
          [ 0.,  5.,  9., 14.],
          [ 0.,  8., 12.,  0.]]]], device='cuda:0')

Thanks for taking the time to read through this! Any suggestion is welcome

ptrblck · June 18, 2023, 9:03pm

The ndimage result seems unexpected to me. It seems you are creating an example input array using integer values represented in float32. Both transformation use the nearest interpolation mode which should pick the nearest input value for the corresponding output. Since all inputs are integers I would expect to also see integers only in the transformed output. While this is the case for torchvision the ndimage transformation returns floating point values.
I’m currently not in front of my workstation so cannot reproduce or debug it.

EDIT: I was wrong and the mode argument in ndimage defines the padding behavior. From the docs:

The mode parameter determines how the input array is extended beyond its boundaries. Default is ‘constant’. Behavior for each valid value is as follows (see additional plots and details on boundary modes)

and it seems a spline interpolation is used by default.

sakura1 · June 19, 2023, 5:19am

Thank you for answer. If input array represented in uint8， the result looks different：

c = np.zeros([4, 4]).astype("uint8")

res1:
[[ 2  3  8 11]
 [ 1  5  9 14]
 [ 1  6 10 14]
 [ 4  7 12 13]]

res2:
tensor([[[[ 0,  3,  7,  0],
          [ 1,  6, 10, 15],
          [ 0,  5,  9, 14],
          [ 0,  8, 12,  0]]]], device='cuda:0', dtype=torch.uint8)

Both interpolation result is different, although both are nearest.

My question:
1、 How can i repacle scipy.ndimage.rotate with torchvision operations (affine？)

ptrblck · June 19, 2023, 5:00pm

They are not as explained in my previous post, as ndimage.rotate uses a spline interpolation.
You could disable it and experiment further to narrow down if more settings need to be changed:

res1 = ndimage.rotate(c, 60, reshape=False, mode="constant", cval=0, order=0, prefilter=False)

This setup reduces the error, but still shows 2 mismatches as the edges:

[[ 0  3  0  0]
 [ 0  6 10 15]
 [ 0  5  9  0]
 [ 0  0 12  0]]
tensor([[[[ 0,  3,  7,  0],
          [ 1,  6, 10, 15],
          [ 0,  5,  9, 14],
          [ 0,  8, 12,  0]]]], device='cuda:0', dtype=torch.uint8)