The ‘aux’ layer is used only for training. On inference time, you have just the output of the final layer.
Please correct me if I’m wrong: there’s no need to do the mandatory normalization (“The images have to be loaded in to a range of [0, 1] and then normalized using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]” since it’s already included in the model (as long as transform_input
is set to True
: https://github.com/pytorch/vision/blob/c1746a252372dc62a73766a5772466690fc1b8a6/torchvision/models/inception.py#L72-L76).
For those who are still stuck on this issue (from here and then):
if isinstance(outputs, tuple):
loss = sum((criterion(o,labels) for o in outputs))
else:
loss = criterion(outputs, labels)
If this is true than the master documentation needs to be changed. It states: “All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]”.
Your answer saved my life.