What's the default initialization methods for layers?

Nick_Young · May 17, 2017, 1:40pm

what’s the default initialization methods for layers? Like conv, fc, and RNN layers? are they just initialized to all zeros?

albanD · May 17, 2017, 2:52pm

All the layer are implemented in this folder: https://github.com/pytorch/pytorch/tree/master/torch/nn/modules
The initialization depend on the layer, for example, the linear one is here

Nick_Young · May 17, 2017, 3:16pm

Thank you so much!

Konpat_Ta_Preechakul · February 3, 2018, 5:37am

I see only torch.Tensor(...) without any specific initialization method. I wonder what would that be?

ptrblck · February 3, 2018, 2:43pm

In reset_parameters() the weights are set/reset.

Brando_Miranda · July 7, 2018, 1:18am

Are these initialization basically He init or Xavier init?

klory · July 11, 2018, 11:10am

Tanh --> Xavier
ReLU --> He

Brando_Miranda · July 14, 2018, 12:50am

So Pytorch uses He when its ReLU? Im confused what pytorch does.

Brando_Miranda · July 14, 2018, 1:23am

Sorry ptrblck, Im confused…pytorch uses Xavier or He depending on the activation? Thats what klory seems to imply but the code looks as follows:

def reset_parameters(self):
    stdv = 1. / math.sqrt(self.weight.size(1))
    self.weight.data.uniform_(-stdv, stdv)
    if self.bias is not None:
        self.bias.data.uniform_(-stdv, stdv)

rasbt · July 14, 2018, 2:01am

Tanh → Xavier
ReLU → He

No that’s not correct, PyTorch’s initialization is based on the layer type, not the activation function (the layer doesn’t know about the activation upon weight initialization).

For the linear layer, this would be somewhat similar to He initialization, but not quite:

github.com

pytorch/pytorch/blob/9e2f2cab94027c1be1860b9b5e98ac13c6b0516e/torch/nn/modules/linear.py#L48-L52


      
          def reset_parameters(self):
              stdv = 1. / math.sqrt(self.weight.size(1))
              self.weight.data.uniform_(-stdv, stdv)
              if self.bias is not None:
                  self.bias.data.uniform_(-stdv, stdv)

def reset_parameters(self):
    stdv = 1. / math.sqrt(self.weight.size(1))
    self.weight.data.uniform_(-stdv, stdv)
    if self.bias is not None:
        self.bias.data.uniform_(-stdv, stdv)

Ie., when I remember correctly, He init is “sqrt(6 / fan_in)” whereas in PyTorch Linear layer it’s “1. / sqrt(fan_in)”

klory · July 14, 2018, 2:10am

Yeah, you’re correct, I just check their code for linear.py and conv.py, it seems they’re all using Xavier right? (I got the Xavier explanation from here)

github.com

pytorch/pytorch/blob/099a6d5e083d78e04437a22e7cf963c4e0a1fa18/torch/nn/modules/linear.py#L48-L52


def reset_parameters(self):
    stdv = 1. / math.sqrt(self.weight.size(1))
    self.weight.data.uniform_(-stdv, stdv)
    if self.bias is not None:
        self.bias.data.uniform_(-stdv, stdv)

github.com

pytorch/pytorch/blob/099a6d5e083d78e04437a22e7cf963c4e0a1fa18/torch/nn/modules/conv.py#L40-L47


def reset_parameters(self):
    n = self.in_channels
    for k in self.kernel_size:
        n *= k
    stdv = 1. / math.sqrt(n)
    self.weight.data.uniform_(-stdv, stdv)
    if self.bias is not None:
        self.bias.data.uniform_(-stdv, stdv)

Brando_Miranda · July 14, 2018, 2:11am

are they all using Xavier?

rasbt · July 14, 2018, 2:41am

it seems they’re all using Xavier right?

Doesn’t xavier also include fan_out though? Here, I can only see input channels, not output channels.

Brando_Miranda · July 14, 2018, 2:41am

so is it just a unpublished made up pytorch init?

rasbt · July 14, 2018, 3:08am

Maybe I am overlooking sth or don’t recognize it, but I think so

Brando_Miranda · July 14, 2018, 3:10am

A validation from someone in the pytorch team would be nice calling for call for master @SimonW

SimonW · July 14, 2018, 10:40am

no, they are from respective well-established published papers actually. e.g., linear init is from “Efficient Backprop”, LeCun’99.

Brando_Miranda · July 14, 2018, 10:24pm

Thanks! I appreciate the it.

Brando_Miranda · July 14, 2018, 10:25pm

this one I assume:

rasbt · July 15, 2018, 12:11am

Maybe it would be worthwhile adding comments to the docstrings? Would make it easier for the next person to find plus more convenient to refer to in papers.