How can I set up a model/optimizer so that I can take multiple steps for, say, a training loss function, and then calculate a gradient of my final training loss with respect to the starting parameters or the weights on my training data?
When I try to set up something like this now, I can’t get any gradients when I use the model/optimizer abstractions suggested in PyTorch tutorials. The optimizer abstraction failing me makes sense because it seems to modify the underlying data store when it updates parameters, so the computation graph presumably wouldn’t catch those changes. However, it’s unclear to me why I still can’t solve this problem even when I loop through my model’s parameters and apply gradient descent manually. Thanks!