I’m trying to load 8 pretrained models into another model in PyTorch. The definition of the models I want to load (loaded on top of a vgg net):
self.output_layer=nn.Sequential(nn.Conv2d(512,512, kernel_size=3,padding=2,dilation = 2), nn.Conv2d(512,256, kernel_size=3,padding=2,dilation = 2), nn.Conv2d(256,128, kernel_size=3,padding=2,dilation = 2), nn.Conv2d(128,64, kernel_size=3,padding=2,dilation = 2), nn.Conv2d(64, 1,kernel_size=1))
I’m deep copying these into an array, then loading the weights from the pretrained models, and each gets used on the forward pass as an expert in a gating block.
self.branches=[]
for i in range(8):
self.branches.append(copy.deepcopy(self.output_layer))
Forward pass:
x = self.frontend(x)
regressors=[]
for i in range(8):
regressors.append(self.branches[i](x))
stack=torch.cat(regressors, dim=1)
…
final=(gate[:,:,None,None]*stack)
output=torch.sum(final, dim=1, keepdim=True)
These pretrained layers are not showing up in my model. When I run torchsummary on the model, I am only seeing the vgg and gating layers. The outlayer_layer is missing.
Without deepcopying the output_layer, I see all the layers and parameters in the torchsummary (10,798,408 with deepcopy vs. 42,067,280 without deepcopy.). However, I need deepcopy to load the weights of each model and not just create a list of references to the same output_layer with the same weights.
In the meantime as a workaround I defined 8 different self.output_layers and added them to the branches array. This seems to work out okay.