How to save a LSTM Seq2Seq network (encoder and decoder) from example in tutorials section

I am using the code from here, but I do not how to save the model, encoder and decoder so I do not have to train it every time I want to use it https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#training-and-evaluating

This part is crucial for me - i want to save encoder1 and attn_decoder1 so I can use them after.
How do I do that?

train_now = "yes"
if train_now == "yes": 
    hidden_size = 256
    print("input_lang.n_words",input_lang.n_words)
    print("input_lang.n_words",input_lang.n_words)
    encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
    attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)
    trainIters(encoder1, attn_decoder1, 5000, print_every=50)
    evaluateRandomly(encoder1, attn_decoder1)
else: 
    #load somehow encoder1 and attn_decoder1
    evaluateRandomly(encoder1, attn_decoder1)
######################################################################
1 Like

Hi! Check out this section on saving multiple models in one file.

2 Likes

Saving should be as simple as:

torch.save(encoder1.state_dict(), 'encoder.dict')
torch.save(attn_decoder1.state_dict(), 'decoder.dict')

And loading:

encoder = EncoderRNN(input_lang.n_words, hidden_size).to(device)
decoder = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)

encoder.load_state_dict(torch.load('encoder.dict'))
decoder.load_state_dict(torch.load('decoder.dict'))
3 Likes

This works:
train_now = “nyes”
if train_now == “yes”:
hidden_size = 256
print(“input_lang.n_words”,input_lang.n_words)
print(“input_lang.n_words”,input_lang.n_words)
encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)
trainIters(encoder1, attn_decoder1, 500, print_every=50)
evaluateRandomly(encoder1, attn_decoder1)
torch.save(encoder1, “encoder1.pt”)
torch.save(attn_decoder1, “attn_decoder1.pt”)
else:
encoder1 = torch.load(“encoder1.pt”)
encoder1.eval()
attn_decoder1 = torch.load(“attn_decoder1.pt”)
attn_decoder1.eval()
#load models
evaluateRandomly(encoder1, attn_decoder1)
######################################################################

You might want to read this article, to compare the differences between saving/loading the entire model (this is what you are doing) and saving/loading the model in terms of its state.

1 Like

I read, as far as i understand, it will not work if I move the model to another directory and load from there. Maybe?

I am here just for the correction. I believe this is a RNN Seq-2-Seq network which uses GRU units, not a LTSM Seq2Seq network. GRU units generally take fewer parameters than LSTM, as they do not have an output gate.