Autograd tracking and model weight freezing

Hello I am still confuse with the mechanism in pytorch 1.0 .

  1. How to do weight freeze? some people give examples like this.
for param in model.parameters():
   param.requires_grad = False

Then if all the parameter weight is set requires_grad=False, the what happen if we input tensor requires_grad = True, or vice versa?

  1. Is there any different with tensor requires_grad and layer.weight.requires_grad?

  2. I am doing a crazy things where I have a neural network model let we said A and B. A weight is trainable but B weight is freeze. Automatically B.weight is set to requires_grad=False. Now in my Network the process is the network A will process the input, in the middle of it, the intermediate A result is feed to B. After B has an output, that output is combined with intermediate A, then A process the combined feature then finally giving the final Network result. I have done this and the result is the loss becomes Nan. I think it is because autograd failed to track since we have a combined result of freeze and unfreeze weight. How to do it properly?

  3. Like in my third question, what happen if we have model with requires_grad parameter is False (like B parts ) if we have this statement?

with torch.set_grad_enabled(True):
  1. I am still confuse what is the different of model.train(False) and model.eval() ? Do I need to do both model.train(False) and model.eval() for every validation and test step?
    Simply I have a model which deploy dropout layers. I am doing like this:
    In training phase -> model.train(True)
    In validation phase -> model.train(False)
    In testing phase -> model.eval()
    However I found that my model is not working properly. I must remove model.eval() to get the best result. Later I tried in validation phase ->model.train(False) followed by model.eval() . However again the result is not good I must remove model.eval() in testing phase. Could anyone explain this phenomena?, what should I do in validation and testing phase?, is it enough if we only use model.train(False) ? How about if the tensor is required_grads=False but the model.train(True)?