[solved] Why we need to detach Variable which contains hidden representation?

ptrblck · November 12, 2019, 6:20pm

I would say it depends on your use case and maybe your workflow.
E.g. how would you like to expose the functionality of:

backpropagating through all seen data (i.e. in PyTorch just don’t detach the hidden state)
use only the last input batch?

New proposals for these use cases (and UX) are always welcome.