I’m developing a LSTM AutoEncoder
to encode text data. I’ve trained Flair's DocumentRNNEmbeddings
to embed sequence of sentences and using the saved model.
Each batch of my training data is of the shape [12,5,2048]
; i.e; each sample contains 5 sentences
having dimensions 2048
.
PROBLEM
I’ve tried both BCEWithLogitLoss
and MSELoss
along with Adam Optimizer
; in both the cases my training loss was fairly good 0.31 on average with BCE
and 0.011 on average with MSE
but my validation loss mostly ends up being inf
or in some rare cases really huge: 8.025489465e+32
. I’m just trying to validate my approach of Auto Encoder, so my entire training data contains only 335 Sentences
(6 batches of data ) and test data 60 Sentences
(only two batches of data) ; I’m not sure if that’s what is causing the issue or if there’s something wrong with the way I’m performing forward pass
.
Important parts of network initialization:
self.lstm_enc = nn.LSTM(self.input_dim, self.hidden_dim,
self.num_layers, batch_first = True,
dropout = self.dropout_rate,
bidirectional = self.bidirectional)
self.lstm_dec = nn.LSTM(self.hidden_dim*2, self.output_dim // 2,
self.num_layers, batch_first = True,
dropout = self.dropout_rate,
bidirectional = self.bidirectional)
self.hidden_rep = get_hidden_rep
self.activation = nn.ELU()
self.hidden_enc_weights = self._init_hidden_enc_weights()
self.hidden_dec_weights = self._init_hidden_dec_weights()
# self.loss_func = nn.BCEWithLogitsLoss()
self.loss_func = nn.MSELoss(reduction = 'mean')
Here’s how my forward pass looks like:
encoder_output, self.hidden_enc_weights = self.lstm_enc(X_batch, self.hidden_enc_weights)
self.hidden_enc_weights[0].clamp(min = 1e-1);self.hidden_enc_weights[1].clamp(min = 1e-1)
self.hidden_enc_weights[0].detach_(); self.hidden_enc_weights[1].detach_()
encoder_output = self.activation(encoder_output)
if self.step_through_linear:
encoder_output = self.step_linear(encoder_output)
encoder_output = self.activation(encoder_output)
decoder_output, self.hidden_dec_weights = self.lstm_dec(encoder_output, self.hidden_dec_weights)
self.hidden_dec_weights[0].clamp(min = 0);self.hidden_dec_weights[1].clamp(min = 0)
self.hidden_dec_weights[0].detach_(); self.hidden_dec_weights[1].detach_()
if self.hidden_rep:
return encoder_output
else:
decoder_output = self.activation( decoder_output )
#return self.loss_func(decoder_output, torch.flip(X_batch, [0])) # BCEWithLogitLoss
return self.loss_func(X_batch, decoder_output) # MSELOSS
I’ve applied some new techniques, such as reversing the input data
while comparing the loss, and also other techniques such as initializing with xavier weights
and clamping weights of encoder and decoder weights. But, I don’t think they’d affect my loss in a negative way.
WHAT I’VE TRIED TO COUNTER
I did find similar issues such as : Similar Issue Infinite Loss ; I implemented some checks in my code from suggested answers, such as checking for inf
values in my input and validation values, ex:
torch.isfinite(eval_batch).all().item()
The return value always suggested that there were no inf
values in my data in both the sets. So, I’m not able to figure out what’s causing the issue. Any help is much appreciated.
P.S: I moved to pytorch recently from Keras.
TIA !