How to initialize hidden state in GRU

Yes, you can initialize the hidden state in the forward method, but note that you have created a recursion, which might yield errors. Currently the computation graph will be attached to self.hidden so that the backward call might try to backpropagate through multiple iterations. This could be a valid use case, but you should check if this fits your use case or if you want to detach() the hidden state in each forward pass.