Why I get such loss result?

I train a seq2seq model using SGD as the optimizer with momentum=.9

At the beginning the loss reduces as expecting, then the loss increase suddenly even over the initial loss.

BTW: how to upload the image? Does it depend on the user level? The plot of loss may show the result.

Your learning rate might be too high, so that your model catapults itself out of a good region for its parameters.
Could you give some information on your training procedure?

I used the following SGD as the optimizers for encoder and decoder:
encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate, momentum=.9)
decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate, momentum=.9)

I used the NLLLose as the loss function:
criterion = nn.NLLLoss()

Then the process of training is begining:
for epoch …:
for eachsample …:
loss = train(source, target, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion)

In the train(…):

encoder_optimizer.zero_grad()
decoder_optimizer.zero_grad()

for ei in range(input_length):
encoder_output, encoder_hidden = encoder(input_variable[ei], encoder_hidden)
encoder_outputs[ei] = encoder_output[0][0]

decoder_hidden = encoder_hidden # the last hidden in encoder is used as the initial hiddne in decoder

use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

if use_teacher_forcing:
for di in range(target_length):
decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
loss += criterion(decoder_output, target_variable[di])
decoder_input = target_variable[di] # Teacher forcing
else:
for di in range(target_length):
decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
topv, topi = decoder_output.data.topk(1)
ni = topi[0][0]

		decoder_input = Variable(torch.LongTensor([[ni]]))
		decoder_input = decoder_input.cuda() if use_cuda else decoder_input

		loss += criterion(decoder_output, target_variable[di])

loss.backward()
encoder_optimizer.step()
decoder_optimizer.step()