Using two neural network modules to optimize only one

It’s necessary. If you do not set requires_grad = False,and just optimizer.set_parameters(former_model.parameters()) to enable not to update parameters,
but latter_model’s gradients are still computed, causing to take up much GPU-memory