Loss problem in net finetuning

try to add clip_grad_norm_(). Update in your code:

loss.backward()
clip_grad_norm_(model.parameters(), max_norm=10)
optimizer.step()

And I would suggest you to refer my post Efficient train/dev sets evaluation. I think it might help you.