Is that right if I want to average the hidden states h, c based on the directions. So the resulting tensors will be (num_layers, batch, hidden_units), and the hidden units are averaged based on directions?
product_lstm_output, (product_h, product_c) = self.product_lstm(product_inputs)
product_h = product_h.view(self.number_of_layers, self.directions, self.batch, self.hidden_units)#break them down
product_c = product_c.view(self.number_of_layers, self.directions, self.batch, self.hidden_units)
product_h = torch.mean(product_h,dim=1)# average with respect to directions
product_c = torch.mean(product_c,dim=1)