Hi,
- You need to use retain_graph because
.backward()goes through the whole graph (both encode/decoder here). And so if you want to be able to backward in the decoder again you need to retain_graph. - You can use retain_graph if you don’t change any value required by the backward. In particular here, the optimizer step() changes the parameters inplace and might prevent you from being able to backward a second time (make sure to run v1.5.0+ as this was fixed recently).
Both of them will just work very similarly. You will either do extra work during a backward that you don’t care of an extra forward.