I have seen people doing both, torch.nn.init.normal_(m.weight) or torch.nn.init.normal_(m.weight.data). Could someone tell me which is the right way and why? Thanks.
Best,
Zhenlan,
I have seen people doing both, torch.nn.init.normal_(m.weight) or torch.nn.init.normal_(m.weight.data). Could someone tell me which is the right way and why? Thanks.
Best,
Zhenlan,
Hi,
In my shallow view, there is not any difference between them. .data
is a share memory operation.
( If I am wrong, please correct me :-D)
Hi,
I understand the .data get the underlying tensor and stop the grad tracking. So I agree that both will get the same results in terms of initializing the weight. I guess the question is really “weight vs weight.data, which one has the correct grad implication?”.
Best,
Zhenlan
I would avoid using the .data
attribute anymore and instead wrap the code in a
with torch.no_grad():
...
block if necessary.
Autograd cannot warn you, if you manipulate the underlying tensor.data
, which might lead to wrong results.
Got it. Thank you for your answer.