Training recursive model on last model only?

falmasri · May 28, 2018, 12:05pm

I have a model trained this way:

G = model() 
out1 = G(inp)
out2 = G(out1)
criterion(out2,y)

my question, if I calculate the gradient this way it will be calculated on the unfolded model G. I would like to calculate the gradient on the the second call only and taking the output of the first model as the input. Does detach() fix my question here or I should re-create a variable ?

justusschock · May 28, 2018, 12:12pm

Detach should work in this case as it returns a tensor/variable with empty autograd-history

falmasri · May 28, 2018, 12:36pm

What if I want to do apply a cost function of the first model and update it is parameters, and at the same time use its output for the second model without calculating the gradient on the first. this way:

G = model() 
out1 = G(inp)
criterion(out1,y)
out2 = G(out1)
criterion(out2,y)

justusschock · May 28, 2018, 12:47pm

This should also work with detach. Along as you don’t call two times backward on the same gradient-history (because the history is deleted during backward()). If you want to backward the same history (or parts of the history) multiple times (as it is usually done in recurrent networks) you have to specify retain_graph=True in loss_value.backward()

falmasri · May 28, 2018, 12:50pm

the first model will be trained on different loss the second model. so I don’t need to retain the gradient.
The idea is to run the model for the first time and optimize its parameters then run it again using the output of the first run and optimize its parameters that are not calculated on the unfolded model between the two runs.

justusschock · May 28, 2018, 12:59pm

This should work with a simple detach on the output of the first forward pass between the optimizations