Hello I am still confuse with the mechanism in pytorch 1.0 .
- How to do weight freeze? some people give examples like this.
for param in model.parameters():
param.requires_grad = False
Then if all the parameter weight is set requires_grad=False
, the what happen if we input tensor requires_grad = True
, or vice versa?
-
Is there any different with
tensor requires_grad
andlayer.weight.requires_grad
? -
I am doing a crazy things where I have a neural network model let we said A and B. A weight is trainable but B weight is freeze. Automatically
B.weight
is set torequires_grad=False
. Now in my Network the process is the network A will process the input, in the middle of it, the intermediate A result is feed to B. After B has an output, that output is combined with intermediate A, then A process the combined feature then finally giving the final Network result. I have done this and the result is the loss becomes Nan. I think it is because autograd failed to track since we have a combined result of freeze and unfreeze weight. How to do it properly? -
Like in my third question, what happen if we have model with requires_grad parameter is False (like B parts ) if we have this statement?
with torch.set_grad_enabled(True):
model(inputs)
- I am still confuse what is the different of
model.train(False)
andmodel.eval()
? Do I need to do both model.train(False) andmodel.eval()
for every validation and test step?
Simply I have a model which deploy dropout layers. I am doing like this:
In training phase ->model.train(True)
In validation phase ->model.train(False)
In testing phase ->model.eval()
However I found that my model is not working properly. I must removemodel.eval()
to get the best result. Later I tried in validation phase ->model.train(False)
followed bymodel.eval()
. However again the result is not good I must removemodel.eval()
in testing phase. Could anyone explain this phenomena?, what should I do in validation and testing phase?, is it enough if we only usemodel.train(False)
? How about if the tensor isrequired_grads=False
but themodel.train(True)
?