I am trying to implement a paper that uses the activations of an Inception v3 model with the final softmax removed (so just the logits). If I understand correctly, that is exactly what the inception_v3 model from torchvision returns? So should I be calling inception_v3 with pretrained=True, input_transforms=True and aux_logits=False? I don’t know what the last parameter does (I think it means auxiliary logits?) but I doubt I need that for my task since my goal is to get that 1000 length tensor with logit outputs for each of the 1000 ImageNet classes.
Also, with input_transforms=True, what kind of input will that model expect? I know that there is some sort of inherent normalization that goes on inside the model itself. Given this, what range of values should my input be in so that the model functions properly?
I apologize for the wall of questions. This is my first time working with pre-trained models.