Different results after saving and loading


(Bihan Sen) #1

I have written a model, the architecture is follows:

CNNLSTM(                                                                                                                                                                                  
  (cnn): CNNText(                                                                                                                                                                         
    (embed): Embedding(19410, 300, padding_idx=0)                                                                                                                                         
    (convs1): ModuleList(                                                                                                                                                                 
      (0): Conv2d(1, 32, kernel_size=(3, 300), stride=(1, 1))                                                                                                                             
      (1): Conv2d(1, 32, kernel_size=(5, 300), stride=(1, 1))                                                                                                                             
      (2): Conv2d(1, 32, kernel_size=(7, 300), stride=(1, 1))                                                                                                                             
    )                                                                                                                                                                                     
    (dropout): Dropout(p=0.6)                                                                                                                                                             
    (fc1): Linear(in_features=96, out_features=1, bias=True)                                                                                                                              
  )                                                                                                                                                                                       
  (lstm): RNN(                                                                                                                                                                        
    (embedding): Embedding(19410, 300, padding_idx=0)                                                                                                                                     
    (rnn): LSTM(300, 150, batch_first=True, bidirectional=True)                                                                                                                           
    (attention): Attention(                                                                                                                                                               
      (dense): Linear(in_features=300, out_features=1, bias=True)                                                                                                                         
      (tanh): Tanh()                                                                                                                                                                      
      (softmax): Softmax()                                                                                                                                                                
    )                                                                                                                                                                                     
    (fc1): Linear(in_features=300, out_features=50, bias=True)                                                                                                                            
    (dropout): Dropout(p=0.5)                                                                                                                                                             
    (fc2): Linear(in_features=50, out_features=1, bias=True)                                                                                                                              
  )                                                                                                                                                                                       
  (fc1): Linear(in_features=146, out_features=1, bias=True)                                                                                                                               
)

I have used the RNN and the CNN differently on the same dataset and I have the weights saved. In the mixed model, I load the weights using the following function:

def load_pretrained_weights(self, model='cnn', path=None):
    if model not in ['cnn', 'rnn']:
        raise AttributeError("Model must be either rnn or cnn")
    if model == 'cnn':
        self.cnn.load_state_dict(torch.load(path))
    if model == 'rnn':
        self.lstm.load_state_dict(torch.load(path))

And freeze the sub modules using the function:

def freeze(self):    
    for p in self.cnn.parameters():
        p.requires_grad = False
    for p in self.lstm.parameters():
        p.requires_grad = False

Then I train the model, and got better result compared to the each submodule trained and
evaluated alone.
I used an early-stopping technique in my epoch loop to save the best parameters.
After training I made a new instance of the same class and when I load the saved “best” parameters I am not getting similar result.
I tried the same thing with each submodule (RNN and CNNText here) alone, it worked. But in this case it is not giving the same performance.

Few Experiments I tried:

  1. I loaded the saved weights of each submodule and loaded the best parameters, got somehow close to the best result.
  2. Took the hidden layer from each submodule before applying the dropout, got better than the previous, but not the best!

Please help me understand it what is happening here. I am new to Deep Learning concepts.
Thank you.