Am I using torchvision.Transforms the right way?

Right now I’m currently using this for the transformations of my images before feeding them into my CNN for training:

   self.transform = transforms.Compose([
        transforms.ToPILImage(),
        transforms.RandomCrop(60),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor()
    ])

which is located in my IcebergDataset class which is a subclass of torch.utils.data.Dataset.

During testing, I am still using this transform code which to me seems a bit weird as I’m augmenting my test set images. Am I using this functionality the right way?

Hi @cakeeatingpolarbear,

it looks correct to me. Why does it seem weird? Are those images normalized?

Sorry I should clarify what I mean by “weird”. I guess I just find it weird as I have this preconception that the test data shouldn’t be changed and by using the transforms I am changed it. Though, maybe my preconception is wrong, in which case I guess I am using the transformation functionalities correctly.

If you’re using the same Dataset object, then yes, you’re using the same transforms for train and test. Therefore, you’re transforming the test set too. Check this tutorial: http://pytorch.org/tutorials/beginner/data_loading_tutorial.html

Normally you have 2 transforms like this:

train_transform = transforms.Compose([
        transforms.ToPILImage(),
        transforms.RandomCrop(60),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor()
    ])
test_transform = transforms.Compose([
        transforms.ToPILImage(),
        transforms.ToTensor()
    ])

and the custom Dataset object takes transform as a parameter so that you can use different transforms for training data and testing data. E.g.:

trainset = CustomDataset(data, transform=train_transform)
testset = CustomDataset(data, transform=test_transform)

This way, your test data is not transformed. Hope this helps.

4 Likes

You could also yse CenterCrop for your test dataset.

Also FiveCrop can give you additional performance gains, if runtime (averaging over the 5 crops) is not a major issue.

1 Like

That was pretty silly of me, so obvious that passing in a new Transform would solve everything…

1 Like

Just use a separate train_transform and val_transform because people often do not do data augmentation on validation data.

something in his transformation seems odd for me. Please guide me.

test_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.ToTensor()
])

this seems logically wrong because I think the images in torch are loaded as PIL image. In order to use them inin convolution networks, we must convert them to Tensor. So we use transforms.ToTensor(). When I want to show an image in dataloader (image in dataloader is tensor), I convert the tensor image to PILImage using transformation. But here both of transformations are written in one place. That’s why it seems odd for me.

Please guide me. Thank you in advance

Usually the image is already loaded as PIL image, you do not need to use transforms.ToPILImage(). Also it is not straight forward to convert a tensor to image because the tensors are usually normalized using per-channel mean and std.

If you’re using any data augmentation in pytorch, such as RandomCrop or Random Flip, its input should be always PILImage.
Try to transform your train input data without ToPILImage method:

test_transform = transforms.Compose([
    transforms.RandomCrop(60),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor()
])

You will catch an exception, because input data should be PILImage type.
So, what does PILImage method do? It transforms each pixel of image to [0;1] representation. Imagine, you have rgb image and its shape is [3x60x60]. Each element has value from 0 to 255 and transforms to [0;1] representation after PILImage operation.
If you have applied PILImage operation to train data, you should do it for test data too.
Best regrads,
Alex

2 Likes

thanks for your kind reply Alex.
your solution seems correct when I want to implement all operation and network without using tensor from scratch. But when I want to use a pretrained model, it is necessary to convert image to tensor. This sample code shows this subject:
trained_model = torchvision.models.vgg16() #myModel() #
trained_model.eval()

############ Import new image for classification #########################
img = Image.open('/home/morteza/PycharmProjects/transfer_learning/hymenoptera_data/val/ants/94999827_36895faade.jpg')
img.show()
loader = transforms.Compose([transforms.Scale(224), transforms.ToTensor()])
img = loader(img).float()
img = Variable(img)
img = img.unsqueeze(0)
pred = trained_model(img)
print(pred)

If I remove transforms.ToTensor(), I can not use Variable(img) and img.unsqueeze(0). Then when PIL image is fed to the model, dimension error is appeared. because the size of input image must be 1x1xw,h.

Ofc, because Variable input is torch object. If you didn’t convert ToTensor, it would remain PILImage object.

Hi @cakeeatingpolarbear ,
I’m trying to use pretty much the same transforms on my Iceberg dataset but it’s throwing errors as my image has only 2 channels. How many channels does your input image have?

I ended up defining my own transforms, but for the Iceberg dataset you could just make a new channel by averaging the first 2 channels, then you would be able to use the torch.transform functions

1 Like

Thanks, that clears up my doubt.