Hi,
I’m new to pytorch and i’m training an encoder-decoder LSTM for abstractive text summarization. The problem is that the loss is almost stable during training.
my configurations are:
VOCAB_SIZE = 50000
BATCH_SIZE = 16
EMBED_DIM = 200
HIDDEN_DIM = 768
DECODER_NUM_LAYERS = 2
MAX_ENCODER_LEN = 350
MAX_DECODER_LEN = 80
loss calculation is as follows:
loss = 0
'''
the network output is y_hat of shape [batch_size, seq_len, vocab_size]
the target is y of shape [batch_size, target_seq_len]
please note that target_seq_len is equal to seq_len (used right padding)
i'm not sure whether this method is correct or not please correct
me if i'm wrong.
'''
for j in range(y.size(1)):
y_hat_step = y_hat[:, j]
y_step = y[:, j]
loss += loss_function(y_hat_step, y_step)
del(y_hat_step)
del(y_step)
loss.backward()
optimizer.step()
loss_function and optimizer
loss_function = CrossEntropyLoss(ignore_index=0, reduction='mean')
optimizer = Adam(model.parameters(), lr=0.01)
#also tried lr=0.001 and lr=0.0001
my model:
EncoderDecoder(
(encoder): Encoder(
(embedding_layer): Embedding(50000, 200, padding_idx=0)
(dropout): Dropout(p=0.3, inplace=False)
(lstm_layer): LSTM(200, 768, batch_first=True, dropout=0.3, bidirectional=True)
)
(decoder): Decoder(
(embedding): Embedding(50000, 200, padding_idx=0)
(dropout): Dropout(p=0.3, inplace=False)
(lstm): LSTM(200, 768, num_layers=2, batch_first=True, dropout=0.3)
(output): Linear(in_features=768, out_features=50000, bias=True)
)
)