Correct order for torchvision.transforms? (inconsistent behaviour for RandAugment)

Hello,

I’m trying to apply torchvision.transforms.RandAugment to some images, however it seems to be inconsistent in its results (I know the transforms will be random so it’s a different type of inconsistency).

The documentation on RandAugment says the input should be of torch.uint8.

However, I’ve found it to both work with and error with inputs of torch.float32.

My current transforms pipeline looks like this:

train_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(), # it seems the order in where this is placed effects whether the transform works or not
    transforms.RandAugment()
])

Using the above seems to generate inconsistent results.

But if I change the order:

train_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.RandAugment(),
    transforms.ToTensor()
])

It seems to work without fail.

So my questions are:

  1. Is there a best practice on the order of transforms?
  2. Or do I need to not worry about transforms.ToTensor since transforms.RandAugment returns a torch.Tensor?

Example code notebook here: https://colab.research.google.com/drive/1edkqgfDxry8BNvwdf9wGF7KUbvaEMnQm?usp=sharing

Update on 2.

It turns out without transforms.ToTensor(), e.g:

train_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.RandAugment(),
    # transforms.ToTensor() 
])

When trying to use methods such as permute() on the output of the above, it fails with error:

AttributeError: 'Image' object has no attribute 'permute'

So it seems the ideal order in my case must be:

train_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.RandAugment(),
    transforms.ToTensor()
])

Is this right? Or should the augmentations be done before resizing?

Hi @mrdbourke remember that ToTensor() normalize the image between 0 and 1 but RandAugment can be applit to a Tensor (that’s what ToTensor() returns) or can be applied to a PIL Image (that’s what you have after Resize((64, 64)).

I think that you want this:

train_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.RandAugment(),
    transforms.ToTensor()
])

Hope it helps.

1 Like

Thank you Ivan! You’re right that order works better, still getting used to constructing PyTorch transforms.

Cheers.