Linear layer default weight initialization

James_McCaffrey · August 21, 2018, 6:35pm

The default Linear layer weight initialization mechanism isn’t clear to me.

If I use default initialization, without calling tensor.nn.init.XX or reset_parameters(), I get different weight values than when I do explicitly initialize.

Consider this code:

# init_explore.py
# PyTorch 0.4 Anaconda3 4.1.1 (Python 3.5.2)
# explore layer initializations

import torch as T

class Net1(T.nn.Module):
  # default weight initialization
  def __init__(self):
    super(Net1, self).__init__()
    self.fc1 = T.nn.Linear(4, 5) 

class Net2(T.nn.Module):
  # explicit nn.init
  def __init__(self):
    super(Net2, self).__init__()
    self.fc1 = T.nn.Linear(4, 5) 
    x = 0.5  # 1. / sqrt(4)
    T.nn.init.uniform_(self.fc1.weight, -x, x)
    T.nn.init.uniform_(self.fc1.bias, -x, x)

# -----------------------------------------------------------

def main():
  print("\nBegin Init explore with PyTorch \n")

  T.manual_seed(1)
  net1 = Net1()
  # net1.fc1.reset_parameters()
  print("Default init weights: ")
  print(net1.fc1.weight)

  T.manual_seed(1)
  net2 = Net2()
  print("\n\nExplicit nn.init.uniform_ weights: ")
  print(net2.fc1.weight)

  print("\n\nEnd Init explore")

if __name__ == "__main__":
  main()

The weight values of the two networks are different. If the reset_parameters() statement is un-commented, the weight values are the same.

Is this correct behavior?

(apologies in advance for any etiquette blunders – this is my first post)

James_McCaffrey · August 21, 2018, 6:38pm

(from the poster – sorry about the formatting – I have no idea what went wrong . . . )

ptrblck · August 21, 2018, 8:51pm

You can add code using three backticks (```).
I’ve formatted your code for you.

The different values are due to an additional “random operation” for Net2.
While you are setting the random seed for Net1 directly before sampling the parameters, you create the linear layer for Net2 first, and then sample the parameters again.
Add T.manual_seed(1) directly before T.nn.init.uniform_ in Net2 and you will get the same values.

James_McCaffrey · August 21, 2018, 9:03pm

Ah! Yes, that works. Thank you ptblck – this was driving me crazy. JM