Training recursive model on last model only?

I have a model trained this way:

G = model() 
out1 = G(inp)
out2 = G(out1)
criterion(out2,y)

my question, if I calculate the gradient this way it will be calculated on the unfolded model G. I would like to calculate the gradient on the the second call only and taking the output of the first model as the input. Does detach() fix my question here or I should re-create a variable ?

Detach should work in this case as it returns a tensor/variable with empty autograd-history

1 Like

What if I want to do apply a cost function of the first model and update it is parameters, and at the same time use its output for the second model without calculating the gradient on the first. this way:

G = model() 
out1 = G(inp)
criterion(out1,y)
out2 = G(out1)
criterion(out2,y)

This should also work with detach. Along as you don’t call two times backward on the same gradient-history (because the history is deleted during backward()). If you want to backward the same history (or parts of the history) multiple times (as it is usually done in recurrent networks) you have to specify retain_graph=True in loss_value.backward()

the first model will be trained on different loss the second model. so I don’t need to retain the gradient.
The idea is to run the model for the first time and optimize its parameters then run it again using the output of the first run and optimize its parameters that are not calculated on the unfolded model between the two runs.

This should work with a simple detach on the output of the first forward pass between the optimizations

1 Like