I’m recently interested in Touchscript, so I went to look at the custom LSTMs using Touchscript. In particular, I’m looking at this file. As a result, I have a few questions.

**1.** I understand that combining the weights of a linear transformation are for speeding up the code. But why compute it separately for inputs and hiddens, as it is here:

` hx, cx = state gates = (torch.mm(input, self.weight_ih.t()) + self.bias_ih + torch.mm(hx, self.weight_hh.t()) + self.bias_hh) ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)`

Can someone explain the benefit in comparison to first concatenating the tensors and compute together?

**2.**Following the previous question, according to my understanding, the code mentioned above is trying to apply linear transformation separately to inputs and hiddens and add them up at last. However, 2 biases are added together, as:

` gates = (torch.mm(input, self.weight_ih.t()) + self.bias_ih + torch.mm(hx, self.weight_hh.t()) + self.bias_hh)`

I think the 2 biases can be replaced by a single one with the exact same effect, so I want to ask what’s the difference between adding 1 bias and 2 biases.

**3.**If combing several operations can speed up the code, then instead of using: ` ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)

```
ingate = torch.sigmoid(ingate)
forgetgate = torch.sigmoid(forgetgate)
cellgate = torch.tanh(cellgate)
outgate = torch.sigmoid(outgate)`
```

, why not just:

` gates[:, 0:3*self.hidden_size].sigmoid_() # Doesn't have to be in-place ingate, forgetgate, outgate, cellgate = gates.chunk(4, 1) cellgate=cellgate.tanh()`

Aren’t the operations in this way more “combined together”, or is it that combing operations doesn’t effect element-wise operations much?

Thanks for your time