Here, the layers of the parallel networks are defined in the model itself. And this seems to work just fine. However, now I was trying to make the model easier to reduce by “extracting” the parallel model as follows:
class CoreModel(nn.Module):
def __init__(self):
super(CoreModel, self).__init__()
# Define layers
self.layer1 = ...
self.layer2 = ...
...
def forward(self, input):
X = self.layer1(input)
X = self.layer2(X)
...
return X
class SiameseNetwork(nn.Module):
def __init__(self, core_model):
super(SiameseNetwork, self).__init__()
self.core_model = core_model
def forward(self, input1, input2):
output1 = self.core_model(input1)
output2 = self.core_model(input2)
return output1, output2
core_model = CoreModel()
model = SiameseNetwork(core_model)
The problem is that when I call loss.backward() during training it says that I cannot call backward() twice. I assume that’s because I wrapped the layers in a nn.Module.
Is there a right way to define the “core model” on its own and then just give it to the Siamese Network model as parameter?
@ptrblck thanks! Your examples for Variant A and Variant B work for me as well. When I use my “core model” (with embedding layer, RNN layer, linear layers) it still only works with A. All I did was to put the code from CoreModel directly into the SiameseNetwork class. I simply cannot see the difference at the moment.
I probably will need to slowly extend the basic work Variant B to see where it begins to break down.
EDIT: Yeah, slowly building up the basic network of Variant B did the trick. I cannot really tell where I made an error in the first place. I assume(!) that I didn’t correctly re-initialize the hidden state of the RNN layer at the right time(s).