Hi,

I’m new to pytorch and i’m training an encoder-decoder LSTM for abstractive text summarization. The problem is that the loss is almost stable during training.

my configurations are:

VOCAB_SIZE = 50000

BATCH_SIZE = 16

EMBED_DIM = 200

HIDDEN_DIM = 768

DECODER_NUM_LAYERS = 2

MAX_ENCODER_LEN = 350

MAX_DECODER_LEN = 80

loss calculation is as follows:

```
loss = 0
'''
the network output is y_hat of shape [batch_size, seq_len, vocab_size]
the target is y of shape [batch_size, target_seq_len]
please note that target_seq_len is equal to seq_len (used right padding)
i'm not sure whether this method is correct or not please correct
me if i'm wrong.
'''
for j in range(y.size(1)):
y_hat_step = y_hat[:, j]
y_step = y[:, j]
loss += loss_function(y_hat_step, y_step)
del(y_hat_step)
del(y_step)
loss.backward()
optimizer.step()
```

loss_function and optimizer

```
loss_function = CrossEntropyLoss(ignore_index=0, reduction='mean')
optimizer = Adam(model.parameters(), lr=0.01)
#also tried lr=0.001 and lr=0.0001
```

my model:

```
EncoderDecoder(
(encoder): Encoder(
(embedding_layer): Embedding(50000, 200, padding_idx=0)
(dropout): Dropout(p=0.3, inplace=False)
(lstm_layer): LSTM(200, 768, batch_first=True, dropout=0.3, bidirectional=True)
)
(decoder): Decoder(
(embedding): Embedding(50000, 200, padding_idx=0)
(dropout): Dropout(p=0.3, inplace=False)
(lstm): LSTM(200, 768, num_layers=2, batch_first=True, dropout=0.3)
(output): Linear(in_features=768, out_features=50000, bias=True)
)
)
```