In the time_sequence_prediction example, the hidden state and cell state are initialized with requires_grad=False. Why is this done? Shouldn’t outputs produced be differentiated wrt these?
In the time_sequence_prediction example, the hidden state and cell state are initialized with requires_grad=False. Why is this done? Shouldn’t outputs produced be differentiated wrt these?