Initializing parameters with weight or weight.data?

I have seen people doing both, torch.nn.init.normal_(m.weight) or torch.nn.init.normal_(m.weight.data). Could someone tell me which is the right way and why? Thanks.

Best,

Zhenlan,

1 Like

Hi,

In my shallow view, there is not any difference between them. .data is a share memory operation.
( If I am wrong, please correct me :-D)

Hi,

I understand the .data get the underlying tensor and stop the grad tracking. So I agree that both will get the same results in terms of initializing the weight. I guess the question is really “weight vs weight.data, which one has the correct grad implication?”.

Best,

Zhenlan

1 Like

I would avoid using the .data attribute anymore and instead wrap the code in a

with torch.no_grad():
    ...

block if necessary.
Autograd cannot warn you, if you manipulate the underlying tensor.data, which might lead to wrong results.

3 Likes

Got it. Thank you for your answer.