Saving the model parameters correctly?

Another newbie question here.
My model’s training loss decreases pretty fast, yet the performance on the validation data is very poor. It basically takes random guesses. I already increased the amount of training data heavily to avoid overfitting. I am wondering now whether I am doing somethin wrong with the way I save/load the model parameters…I look through some example codes but I don’t find them too useful for beginners.

Some code snippets:

Training and saving the model:

def train_model(learning_rate, epochs): 
    
     dual_encoder.train()
        
     optimizer = torch.optim.Adam(dual_encoder.parameters(), lr = learning_rate)
       
     loss_func = torch.nn.BCEWithLogitsLoss()
     
     for epoch in range(epochs): 
                             
            context_id_list, response_id_list, label_array = load_ids_and_labels(dataframe, word_to_id)
            
            loss_sum = 0.0
                
            for i in range(len(label_array)):
                context = autograd.Variable(torch.LongTensor(context_id_list[i]).view(-1,1), requires_grad = False)
                
                response = autograd.Variable(torch.LongTensor(response_id_list[i]).view(-1, 1), requires_grad = False)
                
                label = autograd.Variable(torch.FloatTensor(torch.from_numpy(np.array(label_array[i]).reshape(1,1))), requires_grad = False)
                
                score = dual_encoder(context, response)
        
                loss = loss_func(score, label)
                
                loss_sum += loss.data[0]
                
                loss.backward()
        
                optimizer.step()
               
                optimizer.zero_grad()
                
                torch.nn.utils.clip_grad_norm(dual_encoder.parameters(), 10)
                
                
            print("Epoch: ", epoch, ", Loss: ", (loss_sum/len(label_array)))
            
            

train_model(learning_rate = 0.001, epochs = 5)

torch.save(dual_encoder.state_dict(), 'SAVED_MODEL.pt')

Loading the model:

encoder_model = model_and_training.Encoder(
        input_size = model_and_training.emb_dim,
        hidden_size = 200,
        vocab_size = model_and_training.vocab_len)

dual_encoder = model_and_training.DualEncoder(encoder_model)

dual_encoder.load_state_dict(torch.load('SAVED_MODEL.pt'))

dual_encoder.eval()

I just realized that I might be using the wrong approach according to this explanation

I actually need to save the whole model, so I simply tried:

torch.save(dual_encoder, 'SAVED_MODEL.pt')

dual_encoder = torch.load('SAVED_MODEL.pt')
dual_encoder.eval()

this gives me a strange error:

File “/home/janinanu/anaconda3/lib/python3.5/inspect.py”, line 618, in getfile
raise TypeError(’{!r} is a built-in class’.format(object))

TypeError: <module ‘main’> is a built-in class

Your original approach is arguably better. What happens to validation loss if you don’t set .eval()?

Just tried it. No improvement at all…

Could it be overfitting?

I tried it with different amounts of data. Whether I use 1.000, 10.000, or 100.000 training examples - no effect whatsoever…:no_mouth:

The way I apply the dual_encoder model is roughly like this (see the line score = dual_encoder(context, candidate_response) )

Can I just use the model like this after loading it?

def validate_model(): 
 
  
    for example in range(len(id_list_eval_dict['Context'])):
        
        score_per_candidate_dict = {}
 
        for column_name, id_list in sorted(id_list_eval_dict.items()): 

            if column_name != 'Context':
        
                context = autograd.Variable(torch.LongTensor(id_list_eval_dict['Context'][example]).view(len(id_list_eval_dict['Context'][example]),1), requires_grad = False).cuda()
                
                candidate_response = autograd.Variable(torch.LongTensor(id_list_eval_dict[column_name][example]).view(len(id_list_eval_dict[column_name][example]), 1), requires_grad = False).cuda()
    
                score = dual_encoder(context, candidate_response)

                score_sigmoid = torch.sigmoid(score)
                
                score_per_candidate_dict["Score with " + column_name] = score_sigmoid.data[0][0]
        
        scores_per_example_dict[example] = score_per_candidate_dict
    
    return scores_per_example_dict

I just noticed an odd thing. It does not make any difference whether I actually run the line
dual_encoder.load_state_dict(torch.load('SAVED_MODEL.pt')) or not.

encoder_model = model_and_training.Encoder(
        input_size = model_and_training.emb_dim,
        hidden_size = 200,
        vocab_size = model_and_training.vocab_len)

dual_encoder = model_and_training.DualEncoder(encoder_model)

#dual_encoder.load_state_dict(torch.load('SAVED_MODEL.pt')) 

dual_encoder.eval()

gives me the same validation performance as when I uncomment it…

Hmm that is rather interesting. Do you mind sharing the structure of your model? If there are some data I can train on to reproduce the issue, that would be great!

I think it works now, I honestly cannot really point out what the problem was though. :woman_shrugging: I changed the general training and validation setup.
thanks!

No worries! Glad to know that it’s working!