Hi! I am trying to train two models. The problem is a regression; setup is as follows:
network F and G. F turn input x into feature vector F(x), G takes F(x) and give G(F(x))
This leads to some problems:
1 since G’s input = F’s output, when F converges, I need to stop F and keep training G
the way I ‘lock’ F is: put the loss.backward and optimizer.step under an if() statement
Does this do what I think it does? (locks F, keep training G)
2 following the rule ‘total parameters = 2 * number of samples + dimension of input’, should ‘total parameters’ apply to ‘parameters of F + parameter of G’ or F and G separately? (i.e. if total parameters = 10, should F and G have 10 params each, or should their number of params add up to 10?)
Any answers, critiques, and questions would be helpful!
Many Thanks!