Hello,
I’m new to PyTorch
I have a regression task and I use a model that receives two different sequential inputs, produces LSTM to each input separately, concatenates the last hidden of each LSTM, and predicts a value using a linear layer of out_size 1. (my forward() function is written below)
I’m using an accumulated gradient as explained here: [How to implement accumulated gradient?] (the second option), so my model receives a single sample in each forward() call.
I want to add normalization to my model.
- Is there a problem to add batch normalization because I’m using an accumulated gradient?
- Should I add batch normalization or layer normalization?
- Where in my model should I add the normalization? before or after LSTM?
- To which part in the model should I add it? to input1 and input2 separately? after concatenation? add in both places?
My forward function in the model:
def forward(self, input1, input2):
# input1 part
embeds = self.word_embedding(input1) # glove word embedding
encoder1_out = self.encoder1(embeds) #BiLSTM
attention_out = self.HAN(encoder1_out) # hirerchical attention network
# input2 part
encoder2_out = self.encoder2(input2) #BiLSTM
# combined part
info_vector = torch.cat((attention_out, torch.flatten(encoder1_out).unsqueeze(0)), dim=1)
return self.linear(info_vector) # [1, hidden_dim_1 + flatten_hiddden_dim_2] -> [1]
Thank you!
Almog