Hi all,

I am hoping to confirm that what I did for data processing and visualizing images makes sense. I am doing a binary classification problem where images are (480,640,3)-sized depth images of blankets on a table-like surface.

I have the following two gist codes here which one should be able to run in the same directory if I have set things correctly:

The first one has the data there (see bottom post) and loads it for `ImageLoader`

. It also computes mean and standard deviation, by putting all the `numbers.extend( d_img[:,:,0].flatten() )`

stuff into a `numbers`

list and then taking a mean and standard deviation. The mean turns out to be 93 and the standard deviation is 84. Itâ€™s high because I have lots of 0s and lots of brighter values.

**First question**: this is a correct way of computing per-channel mean? The depth images are replicated across 3 channels so the values would be the same across all channels. I imagine there is a more efficient way to do this, though, perhaps dynamically computing the standard deviation somehow? And also, I see on the ImageNet examples that the mean values are within [0,1], so I am not sure if the values here should be scaled as well â€¦

Next, I went ahead to train the model (see second gist). I put this at the top:

```
MEAN = [93.8304761096, 93.8304761096, 93.8304761096]
STD = [84.9985507432, 84.9985507432, 84.9985507432]
```

because, again, data is replicated across three channels.

Hereâ€™s the data transforms:

```
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.ToTensor(),
transforms.Normalize(MEAN, STD)
]),
'valid': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(MEAN, STD)
]),
}
```

I used a pre-trained ResNet-18 model. I went into the training loop and took the first minibatch. Then, I saved all the images. It took me a *long time* to figure out the correct way to get the images back to what I wanted: itâ€™s in `_save_images`

in the second gist. This will save into a directory and I see depth images that make sense, and which have been cropped correctly as you can see later in the second gist. I wanted to visualize the transforms. I had to do something like this:

But what is confusing is that I needed to do this snippet (see Gist for details)

```
img = img.transpose((1,2,0))
img = img*STD + MEAN
img = img*255.0
img = img.astype(int)
```

transpose to get it into (224,224,3), then undo STD, MEAN, and this is really weird: we then multiply by 255. I assume this undoes the scaling that the `ToTensor()`

transform does?

**Second set of question(s)**: the data transformation that I used above makes sense (MEAN and STD on the domain data of interest), and the `ToTensor()`

method can be undone by multiplying the image by 255? Then, I assume MEAN and STD are correctly â€śadjustedâ€ť so that they reflect the rescaled image where pixels are in [0,1], rather than [0,255] as previously?

Sorry for the long message! I just wanted to make sure I was understanding PyTorch correctly. Iâ€™m happy to clarify anything,