Input to pre-trained Inception V3

Entropy · January 5, 2022, 1:40pm

I am trying to implement a paper that uses the activations of an Inception v3 model with the final softmax removed (so just the logits). If I understand correctly, that is exactly what the inception_v3 model from torchvision returns? So should I be calling inception_v3 with pretrained=True, input_transforms=True and aux_logits=False? I don’t know what the last parameter does (I think it means auxiliary logits?) but I doubt I need that for my task since my goal is to get that 1000 length tensor with logit outputs for each of the 1000 ImageNet classes.

Also, with input_transforms=True, what kind of input will that model expect? I know that there is some sort of inherent normalization that goes on inside the model itself. Given this, what range of values should my input be in so that the model functions properly?

I apologize for the wall of questions. This is my first time working with pre-trained models.

ptrblck · January 11, 2022, 5:52am

Yes, the torchvision implementation returns the logits from the last layer as well as the aux. logits, if specified as seen here.

Yes, that’s right and based on your description you could set it to False.

transform_input=True will apply this transformation, which was introduced to match the Inception/Google implementation.