I’m recently interested in Touchscript, so I went to look at the custom LSTMs using Touchscript. In particular, I’m looking at this file. As a result, I have a few questions.
1. I understand that combining the weights of a linear transformation are for speeding up the code. But why compute it separately for inputs and hiddens, as it is here:
hx, cx = state gates = (torch.mm(input, self.weight_ih.t()) + self.bias_ih + torch.mm(hx, self.weight_hh.t()) + self.bias_hh) ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)
Can someone explain the benefit in comparison to first concatenating the tensors and compute together?
**2.**Following the previous question, according to my understanding, the code mentioned above is trying to apply linear transformation separately to inputs and hiddens and add them up at last. However, 2 biases are added together, as:
gates = (torch.mm(input, self.weight_ih.t()) + self.bias_ih + torch.mm(hx, self.weight_hh.t()) + self.bias_hh)
I think the 2 biases can be replaced by a single one with the exact same effect, so I want to ask what’s the difference between adding 1 bias and 2 biases.
**3.**If combing several operations can speed up the code, then instead of using: ` ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)
ingate = torch.sigmoid(ingate) forgetgate = torch.sigmoid(forgetgate) cellgate = torch.tanh(cellgate) outgate = torch.sigmoid(outgate)`
, why not just:
gates[:, 0:3*self.hidden_size].sigmoid_() # Doesn't have to be in-place ingate, forgetgate, outgate, cellgate = gates.chunk(4, 1) cellgate=cellgate.tanh()
Aren’t the operations in this way more “combined together”, or is it that combing operations doesn’t effect element-wise operations much?
Thanks for your time