I’m new to PyTorch
I have a regression task and I use a model that receives two different sequential inputs, produces LSTM to each input separately, concatenates the last hidden of each LSTM, and predicts a value using a linear layer of out_size 1. (my forward() function is written below)
I’m using an accumulated gradient as explained here: [How to implement accumulated gradient？] (the second option), so my model receives a single sample in each forward() call.
I want to add normalization to my model.
- Is there a problem to add batch normalization because I’m using an accumulated gradient?
- Should I add batch normalization or layer normalization?
- Where in my model should I add the normalization? before or after LSTM?
- To which part in the model should I add it? to input1 and input2 separately? after concatenation? add in both places?
My forward function in the model:
def forward(self, input1, input2): # input1 part embeds = self.word_embedding(input1) # glove word embedding encoder1_out = self.encoder1(embeds) #BiLSTM attention_out = self.HAN(encoder1_out) # hirerchical attention network # input2 part encoder2_out = self.encoder2(input2) #BiLSTM # combined part info_vector = torch.cat((attention_out, torch.flatten(encoder1_out).unsqueeze(0)), dim=1) return self.linear(info_vector) # [1, hidden_dim_1 + flatten_hiddden_dim_2] ->