Using gradient optimization within larger gradient optimization training pipeline

Hi,

I had a question on how to implement a training pipeline in torch. I was wondering how I could implement running gradient descent in a single module thats part of a larger multi-module training pipeline. In other words, I want the inner gradient descent steps as a part of the larger computational graph.

For example, I have a training pipeline consisting of 5 modules:

  1. preprocess input batch (x, y_ground_truth)
  2. perform N steps of grad descent to find v, s.t. minimize [v + A - NN(v)] where A is a constant.
  3. postprocess(v) → y_pred
  4. compute loss L = mean((y_pred - y_ground_truth)^2)
  5. backprop to train NN with L.backward(), optimizer.step()

In this setup, you can see that module 2 uses the NN and in order to do grad descent, I would need to compute gradients that depend on the NN. Essentially, I want to know how to build module 2. Any help would be greatly appreciated, thanks!