How to subtract the mean pixel?

ProGamerGov · March 12, 2018, 2:23am

I would like to subtract the mean pixel from my loaded image:

import torch
from PIL import Image
import torchvision
import torchvision.transforms as transforms

image = Image.open(image_name)
loader = transforms.Compose([transforms.Scale(image_size), transforms.ToTensor()])  # resize and convert to tensor
image = loader(image)

I assume I can use: transforms.Normalize(mean, std) like this?

transforms.Normalize(mean = [103.939, 116.779, 123.68])

I am trying to replicate the image processing steps from this code: https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L416-L437

ptrblck · March 12, 2018, 9:54am

transforms.Normalize is applied on torch.Tensors.
Usually you would have something like this:

transform=transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize(250),
    transforms.ToTensor(),
	transforms.Normalize(
        mean = [0.485, 0.456, 0.406],
        std = [0.229, 0.224, 0.225])
    ]
)

Since ToTensor() scales the Tensor to [0, 1], you should use “normalized” mean and stddev values.

ProGamerGov · March 12, 2018, 9:30pm

@ptrblck Is there a way to make ToTensor() scale to 0-255 instead of 0-1? Or would I manually have to convert to a [0, 255] Tensor?

Edit: I think this might work

def preprocess(img):
  mean_pixel = torch.DoubleTensor([103.939, 116.779, 123.68])
  img = torch.FloatTensor(img)
  mean_pixel_image = torch.Tensor()
  mean_pixel_image.resize_as_(img).copy_(mean_pixel)
  mean_pixel_image = mean_pixel_image.float()
  img = img - mean_pixel_image
  return img

image = spi.imread(params.image, mode="RGB").astype(float)
image = imresize(image, params.image_size, interp='bilinear')
image = preprocess(image)

But that results in this when trying to run my net forward with the image:

    net.updateOutput(image)
  File "/usr/local/lib/python2.7/dist-packages/torch/legacy/nn/Sequential.py", line 36, in updateOutput
    currentOutput = module.updateOutput(currentOutput)
  File "/usr/local/lib/python2.7/dist-packages/torch/legacy/nn/SpatialConvolution.py", line 84, in updateOutput
    self._viewWeight()
  File "/usr/local/lib/python2.7/dist-packages/torch/legacy/nn/SpatialConvolution.py", line 75, in _viewWeight
    self.gradWeight = self.gradWeight.view(self.nOutputPlane, self.nInputPlane * self.kH * self.kW)
RuntimeError: invalid argument 2: size '[64 x 27]' is invalid for input with 0 elements at /home/ubuntu/pytorch/aten/src/TH/THStorage.c:41

ptrblck · March 13, 2018, 8:52am

Your code looks ok. You don’t need to resize the mean_pixel, since the Tensor could be broadcasted.

image_tensor = torch.from_numpy(image.astype(np.float32))
image_tensor = image_tensor - mean_pixel.view(1, 1, -1)
image_tensor = image_tensor.permute(2, 0, 1)

I assume spi.imread is scipy’s imread function, so that image will have [w, h, c].
You have to permute your Tensor so that the channel dimension will be at dimension 0 and feed it to your model.