Like others, I have found that trying to manually initialize weights in-place is successful when through weight.data
but not through weight
alone. But I am trying to understand this more deeply. The following code shows three possible (not necessarily equivalent) ways and whether they work.
import torch
import torch.nn as nn
import torch.init as init
def do_init(layer):
# works
init.xavier_uniform_(layer.weight)
# does not work
#layer.weight.random_()
# works
#layer.weight.data.random_()
lin = nn.Linear(4, 5)
print(lin.weight)
do_init(lin)
print(lin.weight)
- Accessing
weight
directly causes errors of an in-place edit of a leaf variable. But how is accessingweight.data
any different? How is it treated differently under the hood? - Is an in-place edit using
data
actually safe or does it confuse the computation graph? It seems we are tricking PyTorch into letting us manually initialize the gradients, but I am not comfortable trying to trick PyTorch without understanding the mechanism. - What do
nn.init
in-place methods do to avoid the “leaf variable” error? Do they also edit thedata
?
I tried looking at the code but could not determine for sure.
Thank you!