Using gradient optimization within larger gradient optimization training pipeline

nchen9191 · August 29, 2024, 11:46pm

Hi,

I had a question on how to implement a training pipeline in torch. I was wondering how I could implement running gradient descent in a single module thats part of a larger multi-module training pipeline. In other words, I want the inner gradient descent steps as a part of the larger computational graph.

For example, I have a training pipeline consisting of 5 modules:

preprocess input batch (x, y_ground_truth)
perform N steps of grad descent to find v, s.t. minimize [v + A - NN(v)] where A is a constant.
postprocess(v) → y_pred
compute loss L = mean((y_pred - y_ground_truth)^2)
backprop to train NN with L.backward(), optimizer.step()

In this setup, you can see that module 2 uses the NN and in order to do grad descent, I would need to compute gradients that depend on the NN. Essentially, I want to know how to build module 2. Any help would be greatly appreciated, thanks!