I’m recently interested in Touchscript, so I went to look at the custom LSTMs using Touchscript. In particular, I’m looking at this file. As a result, I have a few questions.
1. I understand that combining the weights of a linear transformation are for speeding up the code. But why compute it separately for inputs and hiddens, as it is here:
` hx, cx = state gates = (torch.mm(input, self.weight_ih.t()) + self.bias_ih + torch.mm(hx, self.weight_hh.t()) + self.bias_hh) ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)`
Can someone explain the benefit in comparison to first concatenating the tensors and compute together?
**2.**Following the previous question, according to my understanding, the code mentioned above is trying to apply linear transformation separately to inputs and hiddens and add them up at last. However, 2 biases are added together, as:
` gates = (torch.mm(input, self.weight_ih.t()) + self.bias_ih + torch.mm(hx, self.weight_hh.t()) + self.bias_hh)`
I think the 2 biases can be replaced by a single one with the exact same effect, so I want to ask what’s the difference between adding 1 bias and 2 biases.
**3.**If combing several operations can speed up the code, then instead of using: ` ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)

``````    ingate = torch.sigmoid(ingate)
forgetgate = torch.sigmoid(forgetgate)
cellgate = torch.tanh(cellgate)
outgate = torch.sigmoid(outgate)`
``````

, why not just:
` gates[:, 0:3*self.hidden_size].sigmoid_() # Doesn't have to be in-place ingate, forgetgate, outgate, cellgate = gates.chunk(4, 1) cellgate=cellgate.tanh()`
Aren’t the operations in this way more “combined together”, or is it that combing operations doesn’t effect element-wise operations much?

Thanks for your time 