Doubt in .view() function in Neural Style Transfer

thomas-jac · May 3, 2020, 10:40am

I was going through the neural style transfer program using pytorch and have a doubt in the usage of view in the init method of the Normalization class. Just before this class we defined

cnn_normalization_mean = torch.tensor([0.485, 0.456, 0.406]).to(device)
cnn_normalization_std = torch.tensor([0.229, 0.224, 0.225]).to(device)

And inside the __init__method were the lines

self.mean = torch.tensor(mean).view(-1, 1, 1)
self.std = torch.tensor(std).view(-1, 1, 1)

Why did we reshape them as wouldn’t broadcasting take care of it?
My current understanding is that each of the three numbers in the tensors defined above are supposed to normalize a single dimension (one each for height, width and classes), is this okay?

It might be a bit silly, but I’m having a hard time getting an intuition in 3D. Any help will be much appreciated, thanks!

Edit:
The complete init method is

def __init__(self, mean, std):
        super(Normalization, self).__init__()
        # .view the mean and std to make them [C x 1 x 1] so that they can
        # directly work with image Tensor of shape [B x C x H x W].
        # B is batch size. C is number of channels. H is height and W is width.
        self.mean = torch.tensor(mean).view(-1, 1, 1)
        self.std = torch.tensor(std).view(-1, 1, 1)

chetan06 · May 3, 2020, 10:58am

Can you provide the complete code within init because saying something without the code seems a bit difficult

thomas-jac · May 3, 2020, 11:00am

Done! Let me know if anything else is needed.

chetan06 · May 3, 2020, 11:09am

For each image they are taking mean and std per channel so that it can be used for normalization. Normalizing the dimensions don’t sound so appropriate actually we normalize the data within each channel or you can say that we normalize the feature map and we do so per channel that’s why its getting a shape like that. Hope you got me.

thomas-jac · May 3, 2020, 11:21am

Ah, that makes sense but I’m not able to figure out how reshaping it in that way allows it to normalize the data within each channel. Could you please illustrate with a small example?

I’m sorry, I’m pretty new to DL and PyTorch and having a bit of difficulty with certain concepts

chetan06 · May 3, 2020, 2:55pm

The answer to your question lies in the code where they have used this mean. Actually there are many ways to write the code. It may be so that this type of reshaped array is being used in there case. You proceed your reading and you will get your answer.

You don’t have to be sorry. Anyone can be stuck and always feel free to share your doubt with the community.