Weight initilzation

mratsim · May 29, 2017, 8:16am

@lakehanne Iirc, Python allows something like 256 recursive calls, either you forgot a finishing condition in your code or you need another approach (iterator/generator?)

Xiaoyu_Liu · May 30, 2017, 5:28pm

@Atcold Hi, can we simply use model2’s weights to initialize model1’s layer by
model1.conv1.parameters = model2.conv1.parameters ? Thank you.

Vincent_Zhang · June 27, 2017, 12:20pm

@Xiaoyu_Liu You can check this for example. Further more, if you want to copy by module names, you can choose named_parameters() or state_dict(), and make sure of the deep copy.

Brando_Miranda · July 19, 2017, 11:20pm

where is the implementation of Xavier?

Brando_Miranda · July 19, 2017, 11:28pm

is the xavier:

m.weight.data.normal_(0, math.sqrt(2. / n))

???

vabh · July 25, 2017, 8:34am

Here you go:

github.com

pytorch/pytorch/blob/master/torch/nn/init.py#L211


if isinstance(tensor, Variable):
    xavier_uniform(tensor.data, gain=gain)
    return tensor


fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor)
std = gain * math.sqrt(2.0 / (fan_in + fan_out))
a = math.sqrt(3.0) * std  # Calculate uniform bounds from standard deviation
return tensor.uniform_(-a, a)




def xavier_normal(tensor, gain=1):
"""Fills the input Tensor or Variable with values according to the method
described in "Understanding the difficulty of training deep feedforward
neural networks" - Glorot, X. & Bengio, Y. (2010), using a normal
distribution. The resulting tensor will have values sampled from
:math:`N(0, std)` where
:math:`std = gain \\times \sqrt{2 / (fan\_in + fan\_out)}`.
Also known as Glorot initialisation.


Args:
    tensor: an n-dimensional torch.Tensor or autograd.Variable

Brando_Miranda · August 4, 2017, 7:19pm

how did u know that apply existed? doesn’t seem to be documented wherever Im looking.

Brando_Miranda · August 7, 2017, 7:38pm

I guess one can find the official function in the docs:

http://pytorch.org/docs/master/nn.html?highlight=xavier#torch.nn.init.xavier_normal

lantiga · August 7, 2017, 10:18pm

You’re right, apply is not documented. I’ll open a PR.
Updated: https://github.com/pytorch/pytorch/pull/2327

diggerdu · September 2, 2017, 6:55am

module.weight.data.copy_(everythings_you_want)
conv2d weights [Channels, Groups, Height, Width]

will_soon · October 21, 2017, 2:01am

It seems that there are only conv2 layer’s initilzation , not linear layer’s in resnet model. How about linear layer? Is there any default initilzation if I don’t define the initilzation in my model?

Thank you!

litesaber · November 19, 2017, 10:20pm

Looks like glorot uniform is the default initialization for nn.Linear. Check out reset_parameters().

yangn1 · November 28, 2017, 1:49pm

Code god! My I ask you question—>How can I INIT a LSTM or CNN. I have known that A LSTMCell can be INTIed as <nn.init.xavier_uniform(LSTMCell.bias_ih)> OR <nn.init.xavier_uniform(LSTMCell.bias_hh)> . BUT LSTM can’t…that.

weidler · November 29, 2017, 8:54pm

according to dir(net) there are two weight matrices:

weight_hh_l0
weight_ih_l0

so you could do

def initialize_weights(model):
    if type(model) in [nn.Linear]:
        nn.init.xavier_normal(model.weight.data)
    elif type(model) in [nn.LSTM, nn.RNN, nn.GRU]:
        nn.init.xavier_normal(model.weight_hh_l0)
        nn.init.xavier_normal(model.weight_ih_l0)

This is also documented here. So if you have more than 1 recurrent layer, you’ll have to initialize 2 * recurrent_layers weight matrices. Not sure how to elegantly iterate over them, using list(parameters()) and only changing 2 dimensional tensors (i.e. not biases) could be a solution.

yangn1 · November 30, 2017, 1:57am

how about multi-Layer? I can Init one Layer yet, but I cant Inti multi-layer by using a “loop” , too. And I have tried some way， but it doesn’t make sense…

ptrblck · December 1, 2017, 9:53am

You can call the weight_init method on your model using model.apply(weight_init).

Neeharika_Noni · March 20, 2018, 3:46pm

Does batch normalization help in different weight initializations.ensure

you include at least one of Xavier’s and Kaiming’s initializations.can any one please help me for this

ForeverZH0204 · March 29, 2018, 1:08pm

Hi
I read these answer just now.I have a question that:
With

a = nn.Linear(2,3)

I got a fc layer with 2d input and 3d output, and the params in the layer are from a N(0, 1) without my initialization , i.e. the fc layer is initialized automatically.

Can i specify the initialization method like xavier when i initial the layer,or i have to got the params iteratelly and use a function to change it or with the help of apply method?

ptrblck · March 29, 2018, 1:20pm

The Linear layer will return a 2 dimensional output with 3 output features.
It’s initialized automatically using the uniform_ distribution. You can see the initialization here.

You could override the layer and implement your method or just use apply to init it with xavier or another method.

icjwk · December 1, 2018, 5:20pm

no such thing as should, ask any is ok