Initializing parameters with weight or

(Zhenlan Wang) #1

I have seen people doing both, torch.nn.init.normal_(m.weight) or torch.nn.init.normal_( Could someone tell me which is the right way and why? Thanks.





In my shallow view, there is not any difference between them. .data is a share memory operation.
( If I am wrong, please correct me :-D)

(Zhenlan Wang) #3


I understand the .data get the underlying tensor and stop the grad tracking. So I agree that both will get the same results in terms of initializing the weight. I guess the question is really “weight vs, which one has the correct grad implication?”.




I would avoid using the .data attribute anymore and instead wrap the code in a

with torch.no_grad():

block if necessary.
Autograd cannot warn you, if you manipulate the underlying, which might lead to wrong results.

Zero grad on single parameter
Is it correct way to do cross channel normalization?
(Zhenlan Wang) #5

Got it. Thank you for your answer.