Is it correct way to do cross channel normalization?

def NormalizeFex(real_features, fake_features):

    real_features_max, _ = real_features.view(real_features.size(0), real_features.size(1), -1).max(dim=2)
    fake_features_max, _ = fake_features.view(fake_features.size(0), fake_features.size(1), -1).max(dim=2)

    features_max = torch.max(real_features_max, fake_features_max).unsqueeze(2).unsqueeze(3).data

    real_features = real_features / (features_max + 1e-7)
    fake_features = fake_features / (features_max + 1e-7)

    return real_features, fake_features

Is it correct to use data in the maximum value ?
the inputs are features extracted from VGG and I want to normalize them cross channel between o and 1. knowing that the real features should not have gradient.

You could probably wrap in in a

with torch.no_grad():
    ...

block instead as described here.

thanks I’m aware of this. My real concern is, is it correct to normalize by dividing on the .data or no_grad max of the variable ?

Knowing that the max is taking across the fake_features and real_features and giving the fact that real_features has no gradient.

I’m not sure what the use case would be.
Could you post a link to the paper or a reference implementation or explain the use case a bit?

it is my own implementation, So I’m not sure weather it is the correct way or not.
the idea in general, I’m extracting VGG features from my generated image and the target image, but these features vary in scale, I want to normalize them between 0 and 1 but in cross channel normalization so each each channel with it is corresponding channel from the target image is normalized by the maximum of these two pair channels.