I am training a vision model with scipy.ndimage.rotate in cpu-aug. This rotation in the cpu becomes the bottleneck. So I was looking for suggestions to replace it with torchvision. But the resut is diffent from torchvision.transforms.functional.rotate.
My test looks something like this:
import numpy as np
from scipy import ndimage
c = np.zeros([4, 4]).astype("float32")
for i in range(4):
for j in range(4):
c[i][j] = 4 * i + j
res1 = ndimage.rotate(c, 60, reshape=False, mode="nearest")
print(res1)
import torchvision
from torchvision.transforms import InterpolationMode
inp = torch.tensor(c, device="cuda:0")
res2 = torchvision.transforms.functional.rotate(inp[None][None], 60, interpolation=InterpolationMode.NEAREST)
print(res2)
The result looks something like this:
res1 :
[[ 2.0601943 3.3288476 7.8078837 11.240777 ]
[ 1.2019709 4.727933 8.9034605 13.673849 ]
[ 1.3261505 6.096539 10.272067 13.798029 ]
[ 3.759223 7.1921163 11.671152 12.939806 ]]
res2:
tensor([[[[ 0., 3., 7., 0.],
[ 1., 6., 10., 15.],
[ 0., 5., 9., 14.],
[ 0., 8., 12., 0.]]]], device='cuda:0')
Thanks for taking the time to read through this! Any suggestion is welcome
The ndimage
result seems unexpected to me. It seems you are creating an example input array using integer values represented in float32
. Both transformation use the nearest
interpolation mode which should pick the nearest input value for the corresponding output. Since all inputs are integers I would expect to also see integers only in the transformed output. While this is the case for torchvision
the ndimage
transformation returns floating point values.
I’m currently not in front of my workstation so cannot reproduce or debug it.
EDIT: I was wrong and the mode
argument in ndimage
defines the padding behavior. From the docs:
The mode parameter determines how the input array is extended beyond its boundaries. Default is ‘constant’. Behavior for each valid value is as follows (see additional plots and details on boundary modes)
and it seems a spline interpolation is used by default.
Thank you for answer. If input array represented in uint8
, the result looks different:
c = np.zeros([4, 4]).astype("uint8")
res1:
[[ 2 3 8 11]
[ 1 5 9 14]
[ 1 6 10 14]
[ 4 7 12 13]]
res2:
tensor([[[[ 0, 3, 7, 0],
[ 1, 6, 10, 15],
[ 0, 5, 9, 14],
[ 0, 8, 12, 0]]]], device='cuda:0', dtype=torch.uint8)
Both interpolation result is different, although both are nearest.
My question:
1、 How can i repacle scipy.ndimage.rotate with torchvision operations (affine?)
They are not as explained in my previous post, as ndimage.rotate
uses a spline interpolation.
You could disable it and experiment further to narrow down if more settings need to be changed:
res1 = ndimage.rotate(c, 60, reshape=False, mode="constant", cval=0, order=0, prefilter=False)
This setup reduces the error, but still shows 2 mismatches as the edges:
[[ 0 3 0 0]
[ 0 6 10 15]
[ 0 5 9 0]
[ 0 0 12 0]]
tensor([[[[ 0, 3, 7, 0],
[ 1, 6, 10, 15],
[ 0, 5, 9, 14],
[ 0, 8, 12, 0]]]], device='cuda:0', dtype=torch.uint8)