Hi all,
I am hoping to confirm that what I did for data processing and visualizing images makes sense. I am doing a binary classification problem where images are (480,640,3)-sized depth images of blankets on a table-like surface.
I have the following two gist codes here which one should be able to run in the same directory if I have set things correctly:
The first one has the data there (see bottom post) and loads it for ImageLoader
. It also computes mean and standard deviation, by putting all the numbers.extend( d_img[:,:,0].flatten() )
stuff into a numbers
list and then taking a mean and standard deviation. The mean turns out to be 93 and the standard deviation is 84. It’s high because I have lots of 0s and lots of brighter values.
First question: this is a correct way of computing per-channel mean? The depth images are replicated across 3 channels so the values would be the same across all channels. I imagine there is a more efficient way to do this, though, perhaps dynamically computing the standard deviation somehow? And also, I see on the ImageNet examples that the mean values are within [0,1], so I am not sure if the values here should be scaled as well …
Next, I went ahead to train the model (see second gist). I put this at the top:
MEAN = [93.8304761096, 93.8304761096, 93.8304761096]
STD = [84.9985507432, 84.9985507432, 84.9985507432]
because, again, data is replicated across three channels.
Here’s the data transforms:
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.ToTensor(),
transforms.Normalize(MEAN, STD)
]),
'valid': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(MEAN, STD)
]),
}
I used a pre-trained ResNet-18 model. I went into the training loop and took the first minibatch. Then, I saved all the images. It took me a long time to figure out the correct way to get the images back to what I wanted: it’s in _save_images
in the second gist. This will save into a directory and I see depth images that make sense, and which have been cropped correctly as you can see later in the second gist. I wanted to visualize the transforms. I had to do something like this:
But what is confusing is that I needed to do this snippet (see Gist for details)
img = img.transpose((1,2,0))
img = img*STD + MEAN
img = img*255.0
img = img.astype(int)
transpose to get it into (224,224,3), then undo STD, MEAN, and this is really weird: we then multiply by 255. I assume this undoes the scaling that the ToTensor()
transform does?
Second set of question(s): the data transformation that I used above makes sense (MEAN and STD on the domain data of interest), and the ToTensor()
method can be undone by multiplying the image by 255? Then, I assume MEAN and STD are correctly “adjusted” so that they reflect the rescaled image where pixels are in [0,1], rather than [0,255] as previously?
Sorry for the long message! I just wanted to make sure I was understanding PyTorch correctly. I’m happy to clarify anything,