Saving the model parameters correctly?

Janinanu · January 23, 2018, 7:38pm

Another newbie question here.
My model’s training loss decreases pretty fast, yet the performance on the validation data is very poor. It basically takes random guesses. I already increased the amount of training data heavily to avoid overfitting. I am wondering now whether I am doing somethin wrong with the way I save/load the model parameters…I look through some example codes but I don’t find them too useful for beginners.

Some code snippets:

Training and saving the model:

def train_model(learning_rate, epochs): 
    
     dual_encoder.train()
        
     optimizer = torch.optim.Adam(dual_encoder.parameters(), lr = learning_rate)
       
     loss_func = torch.nn.BCEWithLogitsLoss()
     
     for epoch in range(epochs): 
                             
            context_id_list, response_id_list, label_array = load_ids_and_labels(dataframe, word_to_id)
            
            loss_sum = 0.0
                
            for i in range(len(label_array)):
                context = autograd.Variable(torch.LongTensor(context_id_list[i]).view(-1,1), requires_grad = False)
                
                response = autograd.Variable(torch.LongTensor(response_id_list[i]).view(-1, 1), requires_grad = False)
                
                label = autograd.Variable(torch.FloatTensor(torch.from_numpy(np.array(label_array[i]).reshape(1,1))), requires_grad = False)
                
                score = dual_encoder(context, response)
        
                loss = loss_func(score, label)
                
                loss_sum += loss.data[0]
                
                loss.backward()
        
                optimizer.step()
               
                optimizer.zero_grad()
                
                torch.nn.utils.clip_grad_norm(dual_encoder.parameters(), 10)
                
                
            print("Epoch: ", epoch, ", Loss: ", (loss_sum/len(label_array)))
            
            

train_model(learning_rate = 0.001, epochs = 5)

torch.save(dual_encoder.state_dict(), 'SAVED_MODEL.pt')

Loading the model:

encoder_model = model_and_training.Encoder(
        input_size = model_and_training.emb_dim,
        hidden_size = 200,
        vocab_size = model_and_training.vocab_len)

dual_encoder = model_and_training.DualEncoder(encoder_model)

dual_encoder.load_state_dict(torch.load('SAVED_MODEL.pt'))

dual_encoder.eval()

Janinanu · January 23, 2018, 8:12pm

I just realized that I might be using the wrong approach according to this explanation

I actually need to save the whole model, so I simply tried:

torch.save(dual_encoder, 'SAVED_MODEL.pt')

dual_encoder = torch.load('SAVED_MODEL.pt')
dual_encoder.eval()

this gives me a strange error:

File “/home/janinanu/anaconda3/lib/python3.5/inspect.py”, line 618, in getfile
raise TypeError(’{!r} is a built-in class’.format(object))

TypeError: <module ‘main’> is a built-in class

SimonW · January 23, 2018, 8:59pm

Your original approach is arguably better. What happens to validation loss if you don’t set .eval()?

Janinanu · January 23, 2018, 9:27pm

Just tried it. No improvement at all…

SimonW · January 23, 2018, 9:35pm

Could it be overfitting?

Janinanu · January 23, 2018, 9:44pm

I tried it with different amounts of data. Whether I use 1.000, 10.000, or 100.000 training examples - no effect whatsoever…

The way I apply the dual_encoder model is roughly like this (see the line score = dual_encoder(context, candidate_response) )

Can I just use the model like this after loading it?

def validate_model(): 
 
  
    for example in range(len(id_list_eval_dict['Context'])):
        
        score_per_candidate_dict = {}
 
        for column_name, id_list in sorted(id_list_eval_dict.items()): 

            if column_name != 'Context':
        
                context = autograd.Variable(torch.LongTensor(id_list_eval_dict['Context'][example]).view(len(id_list_eval_dict['Context'][example]),1), requires_grad = False).cuda()
                
                candidate_response = autograd.Variable(torch.LongTensor(id_list_eval_dict[column_name][example]).view(len(id_list_eval_dict[column_name][example]), 1), requires_grad = False).cuda()
    
                score = dual_encoder(context, candidate_response)

                score_sigmoid = torch.sigmoid(score)
                
                score_per_candidate_dict["Score with " + column_name] = score_sigmoid.data[0][0]
        
        scores_per_example_dict[example] = score_per_candidate_dict
    
    return scores_per_example_dict

Janinanu · January 23, 2018, 10:37pm

I just noticed an odd thing. It does not make any difference whether I actually run the line
dual_encoder.load_state_dict(torch.load('SAVED_MODEL.pt')) or not.

encoder_model = model_and_training.Encoder(
        input_size = model_and_training.emb_dim,
        hidden_size = 200,
        vocab_size = model_and_training.vocab_len)

dual_encoder = model_and_training.DualEncoder(encoder_model)

#dual_encoder.load_state_dict(torch.load('SAVED_MODEL.pt')) 

dual_encoder.eval()

gives me the same validation performance as when I uncomment it…

SimonW · January 29, 2018, 4:00pm

Hmm that is rather interesting. Do you mind sharing the structure of your model? If there are some data I can train on to reproduce the issue, that would be great!

Janinanu · February 2, 2018, 7:10pm

I think it works now, I honestly cannot really point out what the problem was though. I changed the general training and validation setup.
thanks!

SimonW · February 2, 2018, 7:15pm

No worries! Glad to know that it’s working!