I would say it depends on your use case and maybe your workflow.
E.g. how would you like to expose the functionality of:
- backpropagating through all seen data (i.e. in PyTorch just don’t detach the hidden state)
- use only the last input batch?
New proposals for these use cases (and UX) are always welcome.