Hi,

I had a question on how to implement a training pipeline in torch. I was wondering how I could implement running gradient descent in a single module thats part of a larger multi-module training pipeline. In other words, I want the inner gradient descent steps as a part of the larger computational graph.

For example, I have a training pipeline consisting of 5 modules:

- preprocess input batch (x, y_ground_truth)
- perform N steps of grad descent to find v, s.t. minimize [v + A - NN(v)] where A is a constant.
- postprocess(v) → y_pred
- compute loss L = mean((y_pred - y_ground_truth)^2)
- backprop to train NN with L.backward(), optimizer.step()

In this setup, you can see that module 2 uses the NN and in order to do grad descent, I would need to compute gradients that depend on the NN. Essentially, I want to know how to build module 2. Any help would be greatly appreciated, thanks!