On-the-fly image rotation cpu bottleneck

I am training a rotation detection model rotating the images during training (instead of as a preprocessing stage). This rotation in the cpu becomes the bottleneck (disabling it reduces epoch time from 74s to 6s and drastically improves GPU utilization). I was looking for suggestions to improve my training performance.

My Dataset class looks something like this:

class RotatingDataset(Dataset):
    ...
    def __getitem__(self, idx):
        target = np.random.uniform(-180, 180)
        image = read_pil_image(idx) #~380x380 image

        # relevant bit
        image = scipy.ndimage.rotate(image, target, reshape=True, mode='nearest')
        image = Image.fromarray(image) # back to PIL

        # apply transforms (mainly resize/crop/normalize)
        image = self.transforms(image)

        return image, target

Some additional detail:

  • I’m using scipy.ndimage.rotate (instead of PIL or torchvision.transforms.functional.rotate) because I wanted fill-mode='nearest'.
  • I’ve already implemented the Dataloader optimization “tricks” (num_workers, pin_memory)
  • I have already resized images on disk to ~380 to reduce io.
  • The main reason I want to rotate on the fly is so every epoch a given image is rotated a different angle to mitigate overfitting.

Thanks for taking the time to read through this! Any suggestion is welcome

Have you tried alternative rotation implementations (e.g., skimage’s rotate or albumentations’s rotate)?
Albumentations in particular claims to be very fast for rotation: benchmark.

1 Like

wow albumentation is way faster, thanks for the suggestion! Applying it not only to the rotation but also to the transforms I get around 6s per epoch WITH rotation enabled.

1 Like