I want to create a network (inceptionv3 without the auxiliary, in this case) with multiple inputs, in which the number of inputs could be anywhere let’s say between 16-32 (I intend to feed it with a whole batch).
It’s preferable to have an option to create a network that after it gets a variable (number of inputs, batch size, it’s the same in this case) it creates a network with that number of inputs.
But if it’s easier to create one with a fixed input size (let’s say 16) then it’s ok too.
Shared weights, of course.
Plus, after the last convolution layer, I need to concatenate all the outputs of the same batch and then feed it to a global average pooling layer (I guess I’d need to do mean over the 16 outputs and then feed it Inception’s AdaptiveAveragePooling(1,1) layer (?)).
The output of the global average pooling would be the new embeddings.
From that point it should be as usual - FC layer (basically one output for regression, but a few outputs is optional for classification).
- In training, each batch has one label (only the whole batch has a label, not each sample). That’s why I need to make these modifications
- Basically same for inference. At inference all the images should get one prediction, even it’d be 100 batches, all would belong to one label. Later I’ll deal with slight modifications to the inference, now I’m focusing on the training (and specifically - the architecture.)
- I know I’ll have to deal later with the sampler so it’d sample in accordance with what I want it to do. I have a custom dataset, and I think I know how to approach that.
How do I do this modification to InceptionV3?
Thanks a lot!