Best practices with data processing and normalization --- images seem OK but is normalization OK?

DanielTakeshi · November 7, 2018, 4:57am

Hi all,

I am hoping to confirm that what I did for data processing and visualizing images makes sense. I am doing a binary classification problem where images are (480,640,3)-sized depth images of blankets on a table-like surface.

I have the following two gist codes here which one should be able to run in the same directory if I have set things correctly:

gist.github.com

https://gist.github.com/DanielTakeshi/c2a5ddad85dc3c938c9c61441e769db4

build_data.py

import copy, cv2, os, sys, pickle, time
import numpy as np
from os.path import join

TARGET = 'tmp/'
RAW_PICKLE_FILE = 'data_raw_115_items.pkl'

def prepare_data():
    """Create the appropriate data for PyTorch using `ImageFolder`. From:

This file has been truncated. show original

gist.github.com

https://gist.github.com/DanielTakeshi/bbaf432347aafa2e9878e93fd6982fd7

train.py

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torchvision.models as models
from torchvision import datasets, transforms
import copy, cv2, os, sys, pickle, time
import numpy as np
from os.path import join

This file has been truncated. show original

The first one has the data there (see bottom post) and loads it for ImageLoader. It also computes mean and standard deviation, by putting all the numbers.extend( d_img[:,:,0].flatten() ) stuff into a numbers list and then taking a mean and standard deviation. The mean turns out to be 93 and the standard deviation is 84. It’s high because I have lots of 0s and lots of brighter values.

First question: this is a correct way of computing per-channel mean? The depth images are replicated across 3 channels so the values would be the same across all channels. I imagine there is a more efficient way to do this, though, perhaps dynamically computing the standard deviation somehow? And also, I see on the ImageNet examples that the mean values are within [0,1], so I am not sure if the values here should be scaled as well …

Next, I went ahead to train the model (see second gist). I put this at the top:

MEAN = [93.8304761096, 93.8304761096, 93.8304761096]
STD = [84.9985507432, 84.9985507432, 84.9985507432]

because, again, data is replicated across three channels.

Here’s the data transforms:

    data_transforms = {
        'train': transforms.Compose([
            transforms.RandomResizedCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(MEAN, STD)
        ]),
        'valid': transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(MEAN, STD)
        ]),
}

I used a pre-trained ResNet-18 model. I went into the training loop and took the first minibatch. Then, I saved all the images. It took me a long time to figure out the correct way to get the images back to what I wanted: it’s in _save_images in the second gist. This will save into a directory and I see depth images that make sense, and which have been cropped correctly as you can see later in the second gist. I wanted to visualize the transforms. I had to do something like this:

But what is confusing is that I needed to do this snippet (see Gist for details)

        img = img.transpose((1,2,0))
        img = img*STD + MEAN
        img = img*255.0
        img = img.astype(int)

transpose to get it into (224,224,3), then undo STD, MEAN, and this is really weird: we then multiply by 255. I assume this undoes the scaling that the ToTensor() transform does?

Second set of question(s): the data transformation that I used above makes sense (MEAN and STD on the domain data of interest), and the ToTensor() method can be undone by multiplying the image by 255? Then, I assume MEAN and STD are correctly “adjusted” so that they reflect the rescaled image where pixels are in [0,1], rather than [0,255] as previously?

Sorry for the long message! I just wanted to make sure I was understanding PyTorch correctly. I’m happy to clarify anything,

ptrblck · November 7, 2018, 11:23am

The calculation of the mean and std on your images looks good.

There is a small issue in your transformation.
As you said, the mean and std for the ImageNet data is smaller than yours, because it was calculated on the normalized tensors.
ToTensor will transform your PIL.Images to normalized tensors in the range [0, 1].
If you are using Normalize afterwards, you should make sure to use the mean and std calculated on these tensor images in the range [0, 1]. However, since you’ve already computed these values, you could just scale them with 1./255.

The same applies to undo the normalization using the mean and std.
In your current code snippet you are assuming mean and std were calculated on the normalized tensors.

DanielTakeshi · November 7, 2018, 6:14pm

Thanks @ptrblck

I fixed the code a bit. The issue is that, while saving the images should look good, this is not the correct way to normalize data. The way I had it earlier, if you take the mean and std from pixels in the range [0,255], then the data gets transformed like this:

ToTensor transforms images and scales into range [0,1]
Then Normalize will do this: ([0,1] - rawmean) / rawstd

What we really want is the scaled mean and scaled std, as you pointed out (where by scaled data, I mean values in the range [0,1] not [0,255]). Of course, for undoing, it’s correct either way:

(([0,1] - rawmean) / rawstd) * rawstd + rawmean = [0,1]
and then we multiply by 255.

or

(([0,1] - scaledmean) / scaledstd) * scaledstd + scaledmean = [0,1]
and then we multiply by 255.

I will simply use scaled mean and scaled std on my data from now on.