Initializing parameters with weight or weight.data?

Zhenlan_Wang · March 14, 2019, 6:01pm

I have seen people doing both, torch.nn.init.normal_(m.weight) or torch.nn.init.normal_(m.weight.data). Could someone tell me which is the right way and why? Thanks.

Best,

Zhenlan,

MariosOreo · March 15, 2019, 2:00am

Hi,

In my shallow view, there is not any difference between them. .data is a share memory operation.
( If I am wrong, please correct me :-D)

Zhenlan_Wang · March 15, 2019, 2:04pm

Hi,

I understand the .data get the underlying tensor and stop the grad tracking. So I agree that both will get the same results in terms of initializing the weight. I guess the question is really “weight vs weight.data, which one has the correct grad implication?”.

Best,

Zhenlan

ptrblck · March 16, 2019, 12:46pm

I would avoid using the .data attribute anymore and instead wrap the code in a

with torch.no_grad():
    ...

block if necessary.
Autograd cannot warn you, if you manipulate the underlying tensor.data, which might lead to wrong results.

Zhenlan_Wang · March 16, 2019, 5:51pm

Got it. Thank you for your answer.