What does it mean to normalize images for Resnet?

I’m using a pretty simple set of steps designed to prepare images for feature extraction from a pre trained resnet 152 model.


img=Image.open("Documents/img.png")

# Load the pretrained model
model = models.resnet152(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('fc')

# Set model to evaluation mode
model.eval()

transformations = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
t_img = transformations(img)


What I'm trying to understand is what it means to normalize the images with those means and standard deviations? We obviously aren't setting pixel values right? My understanding is that pixel values should be integers between 0 and 255, so what does it mean to normalize in this context?

Normalize in the above case, mean subtract the mean from each pixel and divide the result by the standard deviation. The input image is float not integer in the range of [0, 1]. So when you load the image, you need to divide it by 255.

[quote=“Luke_Kollmorgen, post:1, topic:96160”]
torchvision.transforms.ToTensor()
[/quote] can transform the pixel range from [0, 255] to [0, 1]

but why a constant value of mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] shouldn’t each image be normalized to 0 mean and 1 standard deviation, or is this mean and std is calculated from the whole dataset?

1 Like

I have the same question. How are these values determined?

These stats are calculated from the ImageNet training dataset.