Hello @zetyquickly
I saw one of the implementations of LSTM and they detached there saying
We need to detach as we are doing truncated backpropagation through time (BPTT)
If we don’t, we’ll backprop all the way to the start even after going through another batch
I turned that part into
...
out, _ = self.lstm(x, (h0, c0))
...
It still gives the same results.