I have written a model, the architecture is follows:
CNNLSTM( (cnn): CNNText( (embed): Embedding(19410, 300, padding_idx=0) (convs1): ModuleList( (0): Conv2d(1, 32, kernel_size=(3, 300), stride=(1, 1)) (1): Conv2d(1, 32, kernel_size=(5, 300), stride=(1, 1)) (2): Conv2d(1, 32, kernel_size=(7, 300), stride=(1, 1)) ) (dropout): Dropout(p=0.6) (fc1): Linear(in_features=96, out_features=1, bias=True) ) (lstm): RNN( (embedding): Embedding(19410, 300, padding_idx=0) (rnn): LSTM(300, 150, batch_first=True, bidirectional=True) (attention): Attention( (dense): Linear(in_features=300, out_features=1, bias=True) (tanh): Tanh() (softmax): Softmax() ) (fc1): Linear(in_features=300, out_features=50, bias=True) (dropout): Dropout(p=0.5) (fc2): Linear(in_features=50, out_features=1, bias=True) ) (fc1): Linear(in_features=146, out_features=1, bias=True) )
I have used the RNN and the CNN differently on the same dataset and I have the weights saved. In the mixed model, I load the weights using the following function:
def load_pretrained_weights(self, model='cnn', path=None): if model not in ['cnn', 'rnn']: raise AttributeError("Model must be either rnn or cnn") if model == 'cnn': self.cnn.load_state_dict(torch.load(path)) if model == 'rnn': self.lstm.load_state_dict(torch.load(path))
And freeze the sub modules using the function:
def freeze(self): for p in self.cnn.parameters(): p.requires_grad = False for p in self.lstm.parameters(): p.requires_grad = False
Then I train the model, and got better result compared to the each submodule trained and
I used an early-stopping technique in my epoch loop to save the best parameters.
After training I made a new instance of the same class and when I load the saved “best” parameters I am not getting similar result.
I tried the same thing with each submodule (RNN and CNNText here) alone, it worked. But in this case it is not giving the same performance.
Few Experiments I tried:
- I loaded the saved weights of each submodule and loaded the best parameters, got somehow close to the best result.
- Took the hidden layer from each submodule before applying the dropout, got better than the previous, but not the best!
Please help me understand it what is happening here. I am new to Deep Learning concepts.