Initializing parameters with weight or

(Zhenlan Wang) #1

I have seen people doing both, torch.nn.init.normal_(m.weight) or torch.nn.init.normal_( Could someone tell me which is the right way and why? Thanks.





In my shallow view, there is not any difference between them. .data is a share memory operation.
( If I am wrong, please correct me :-D)

(Zhenlan Wang) #3


I understand the .data get the underlying tensor and stop the grad tracking. So I agree that both will get the same results in terms of initializing the weight. I guess the question is really “weight vs, which one has the correct grad implication?”.




I would avoid using the .data attribute anymore and instead wrap the code in a

with torch.no_grad():

block if necessary.
Autograd cannot warn you, if you manipulate the underlying, which might lead to wrong results.

(Zhenlan Wang) #5

Got it. Thank you for your answer.