Question about transforms impact on images

Geoffrey_Payne · July 16, 2020, 9:23pm

I load my pickle file as follows;

with open(os.path.join(“.”, picklePath), “br”) as fh:
data = pickle.load(fh)
and from that I set my datasets for images and labels;
X_tr = data[‘training_data’][0]
y_tr = data[‘training_data’][1].ravel()
X_v = data[‘validation_data’][0]
y_v = data[‘validation_data’][1].ravel()
X_t = data[‘test_data’][0]
y_t = data[‘test_data’][1].ravel()

I take a view of the first image and it looks as expected, so I create this transform (note that the resize is commented out)

 _transforms = transforms.Compose([
        transforms.ToPILImage(),
        #transforms.Resize((ImageSize.width, ImageSize.height)),
        transforms.Grayscale(num_output_channels=1),
        transforms.ToTensor()
    ])

I then use this transform to create a custom dataset in which the labels and the images are aligned

    train_dataset = EmotionDatasetTrain(X_tr, y_tr, transform=_transforms)
    test_dataset = EmotionDatasetTest(X_t, y_t, transform=_transforms)
    val_dataset = EmotionDatasetVal(X_v, y_v, transform=_transforms)

And the custom dataset is defined as follows (similar classes exists for Test and Validation)

class EmotionDataSet():
def init(self, transform):
self.transform = transform
self.classes = [“neutral”, “anger”, “contempt”, “disgust”, “fear”, “happy”, “sadness”, “surprise”]
def get_x_y(self, idx, _X, _Y):
    if torch.is_tensor(idx):
        idx = idx.tolist()

    y = _Y[idx].item()
    
    if self.transform:
        x = self.transform(_X[idx].reshape(ImageSize.width, ImageSize.height))
    return x, y
class EmotionDatasetTrain(EmotionDataSet, Dataset):
def __init__(self, X_Train, Y_Train, transform=None):
    super().__init__(transform)
    self.X_Train = X_Train
    self.Y_Train = Y_Train

def __len__(self):
    return len(self.X_Train)

def __getitem__(self, idx):
    x, y = super().get_x_y(idx, self.X_Train, self.Y_Train)
    return x, y

And I have a dataloader;

  train_loader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=Settings.batchSize,
        num_workers=0,
        shuffle=True
    )

My problem is that although the images are read in correctly and can be displayed via Matlibplot, after the transform the images deteriorate considerably when viewed via Matlibplot.
After Face
The image is mostly purple with yellow outlines of the face. The resolution had gone completely, the faces were barely recognisable. When I run them through the CNN the loss function value barely changes after each epoch and given how crude the images are that is hardly surprising. I notice the following.
1/ I was hoping the Resize transform would convert an array of 1000 float numbers into a 100 * 100 matrix. It does not, so I use a reshape instead. I clearly misunderstand the resize transform.
2/ I expected the grayscale transform to turn a RGB image into a single channel grayscale image. I thought what I would see was a black and white image of the colour photo with the same resolution. Instead this was the transform - it would appear - that radically changed the image as described earlier.

I am new to all this, can someone explain what is happening and how I can fix this?

ptrblck · July 19, 2020, 8:41am

I guess the ToPILImage transformation (and later the ToTensor) might try to normalize your numpy arrays, which might result in a clamped output.
Could you try to pass a single input array to torchvision.transforms.functional.to_pil_image() as well as to_tensor() and check the output values?

Geoffrey_Payne · July 20, 2020, 1:56pm

The problem was that although MatlibPlot displayed an image with a green tinge to it, it was actually grayscale already and didn’t need transforming.