Loss is not decreasing while training LSTM Encoder-Decoder

Hi,
I’m new to pytorch and i’m training an encoder-decoder LSTM for abstractive text summarization. The problem is that the loss is almost stable during training.

my configurations are:

VOCAB_SIZE = 50000
BATCH_SIZE = 16
EMBED_DIM = 200
HIDDEN_DIM = 768
DECODER_NUM_LAYERS = 2
MAX_ENCODER_LEN = 350
MAX_DECODER_LEN = 80

loss calculation is as follows:

loss = 0

'''
the network output is y_hat of shape [batch_size, seq_len, vocab_size]
the target is y of shape [batch_size, target_seq_len]
please note that target_seq_len is equal to seq_len (used right padding)
i'm not sure whether this method is correct or not please correct
me if i'm wrong.
'''

for j in range(y.size(1)):
               
    y_hat_step = y_hat[:, j]
    y_step = y[:, j]
    loss += loss_function(y_hat_step, y_step)
                
    del(y_hat_step)
    del(y_step)

loss.backward()
optimizer.step()
            

loss_function and optimizer

loss_function = CrossEntropyLoss(ignore_index=0, reduction='mean')
optimizer = Adam(model.parameters(), lr=0.01)

#also tried lr=0.001 and lr=0.0001

my model:

EncoderDecoder(
  (encoder): Encoder(
    (embedding_layer): Embedding(50000, 200, padding_idx=0)
    (dropout): Dropout(p=0.3, inplace=False)
    (lstm_layer): LSTM(200, 768, batch_first=True, dropout=0.3, bidirectional=True)
  )
  (decoder): Decoder(
    (embedding): Embedding(50000, 200, padding_idx=0)
    (dropout): Dropout(p=0.3, inplace=False)
    (lstm): LSTM(200, 768, num_layers=2, batch_first=True, dropout=0.3)
    (output): Linear(in_features=768, out_features=50000, bias=True)
  )
)