Imagenet example with inception v3

The ‘aux’ layer is used only for training. On inference time, you have just the output of the final layer.

Please correct me if I’m wrong: there’s no need to do the mandatory normalization (“The images have to be loaded in to a range of [0, 1] and then normalized using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]” since it’s already included in the model (as long as transform_input is set to True: https://github.com/pytorch/vision/blob/c1746a252372dc62a73766a5772466690fc1b8a6/torchvision/models/inception.py#L72-L76).

For those who are still stuck on this issue (from here and then):

if isinstance(outputs, tuple):
    loss = sum((criterion(o,labels) for o in outputs))
else:
    loss = criterion(outputs, labels)
9 Likes

If this is true than the master documentation needs to be changed. It states: “All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]”.

1 Like

Your answer saved my life.