Janinanu
(Janina Nuber)
January 23, 2018, 7:38pm
1
Another newbie question here.
My model’s training loss decreases pretty fast, yet the performance on the validation data is very poor. It basically takes random guesses. I already increased the amount of training data heavily to avoid overfitting. I am wondering now whether I am doing somethin wrong with the way I save/load the model parameters…I look through some example codes but I don’t find them too useful for beginners.
Some code snippets:
Training and saving the model:
def train_model(learning_rate, epochs):
dual_encoder.train()
optimizer = torch.optim.Adam(dual_encoder.parameters(), lr = learning_rate)
loss_func = torch.nn.BCEWithLogitsLoss()
for epoch in range(epochs):
context_id_list, response_id_list, label_array = load_ids_and_labels(dataframe, word_to_id)
loss_sum = 0.0
for i in range(len(label_array)):
context = autograd.Variable(torch.LongTensor(context_id_list[i]).view(-1,1), requires_grad = False)
response = autograd.Variable(torch.LongTensor(response_id_list[i]).view(-1, 1), requires_grad = False)
label = autograd.Variable(torch.FloatTensor(torch.from_numpy(np.array(label_array[i]).reshape(1,1))), requires_grad = False)
score = dual_encoder(context, response)
loss = loss_func(score, label)
loss_sum += loss.data[0]
loss.backward()
optimizer.step()
optimizer.zero_grad()
torch.nn.utils.clip_grad_norm(dual_encoder.parameters(), 10)
print("Epoch: ", epoch, ", Loss: ", (loss_sum/len(label_array)))
train_model(learning_rate = 0.001, epochs = 5)
torch.save(dual_encoder.state_dict(), 'SAVED_MODEL.pt')
Loading the model:
encoder_model = model_and_training.Encoder(
input_size = model_and_training.emb_dim,
hidden_size = 200,
vocab_size = model_and_training.vocab_len)
dual_encoder = model_and_training.DualEncoder(encoder_model)
dual_encoder.load_state_dict(torch.load('SAVED_MODEL.pt'))
dual_encoder.eval()
Janinanu
(Janina Nuber)
January 23, 2018, 8:12pm
2
I just realized that I might be using the wrong approach according to this explanation
I actually need to save the whole model, so I simply tried:
torch.save(dual_encoder, 'SAVED_MODEL.pt')
dual_encoder = torch.load('SAVED_MODEL.pt')
dual_encoder.eval()
this gives me a strange error:
File “/home/janinanu/anaconda3/lib/python3.5/inspect.py”, line 618, in getfile
raise TypeError(’{!r} is a built-in class’.format(object))
TypeError: <module ‘main ’> is a built-in class
SimonW
(Simon Wang)
January 23, 2018, 8:59pm
3
Your original approach is arguably better. What happens to validation loss if you don’t set .eval()
?
Janinanu
(Janina Nuber)
January 23, 2018, 9:27pm
4
Just tried it. No improvement at all…
Janinanu
(Janina Nuber)
January 23, 2018, 9:44pm
6
I tried it with different amounts of data. Whether I use 1.000, 10.000, or 100.000 training examples - no effect whatsoever…
The way I apply the dual_encoder model is roughly like this (see the line score = dual_encoder(context, candidate_response ) )
Can I just use the model like this after loading it?
def validate_model():
for example in range(len(id_list_eval_dict['Context'])):
score_per_candidate_dict = {}
for column_name, id_list in sorted(id_list_eval_dict.items()):
if column_name != 'Context':
context = autograd.Variable(torch.LongTensor(id_list_eval_dict['Context'][example]).view(len(id_list_eval_dict['Context'][example]),1), requires_grad = False).cuda()
candidate_response = autograd.Variable(torch.LongTensor(id_list_eval_dict[column_name][example]).view(len(id_list_eval_dict[column_name][example]), 1), requires_grad = False).cuda()
score = dual_encoder(context, candidate_response)
score_sigmoid = torch.sigmoid(score)
score_per_candidate_dict["Score with " + column_name] = score_sigmoid.data[0][0]
scores_per_example_dict[example] = score_per_candidate_dict
return scores_per_example_dict
Janinanu
(Janina Nuber)
January 23, 2018, 10:37pm
7
I just noticed an odd thing. It does not make any difference whether I actually run the line
dual_encoder.load_state_dict(torch.load('SAVED_MODEL.pt'))
or not.
encoder_model = model_and_training.Encoder(
input_size = model_and_training.emb_dim,
hidden_size = 200,
vocab_size = model_and_training.vocab_len)
dual_encoder = model_and_training.DualEncoder(encoder_model)
#dual_encoder.load_state_dict(torch.load('SAVED_MODEL.pt'))
dual_encoder.eval()
gives me the same validation performance as when I uncomment it…
SimonW
(Simon Wang)
January 29, 2018, 4:00pm
8
Hmm that is rather interesting. Do you mind sharing the structure of your model? If there are some data I can train on to reproduce the issue, that would be great!
Janinanu
(Janina Nuber)
February 2, 2018, 7:10pm
9
I think it works now, I honestly cannot really point out what the problem was though. I changed the general training and validation setup.
thanks!
SimonW
(Simon Wang)
February 2, 2018, 7:15pm
10
No worries! Glad to know that it’s working!