The default Linear layer weight initialization mechanism isn’t clear to me.
If I use default initialization, without calling tensor.nn.init.XX or reset_parameters(), I get different weight values than when I do explicitly initialize.
Consider this code:
# init_explore.py
# PyTorch 0.4 Anaconda3 4.1.1 (Python 3.5.2)
# explore layer initializations
import torch as T
class Net1(T.nn.Module):
# default weight initialization
def __init__(self):
super(Net1, self).__init__()
self.fc1 = T.nn.Linear(4, 5)
class Net2(T.nn.Module):
# explicit nn.init
def __init__(self):
super(Net2, self).__init__()
self.fc1 = T.nn.Linear(4, 5)
x = 0.5 # 1. / sqrt(4)
T.nn.init.uniform_(self.fc1.weight, -x, x)
T.nn.init.uniform_(self.fc1.bias, -x, x)
# -----------------------------------------------------------
def main():
print("\nBegin Init explore with PyTorch \n")
T.manual_seed(1)
net1 = Net1()
# net1.fc1.reset_parameters()
print("Default init weights: ")
print(net1.fc1.weight)
T.manual_seed(1)
net2 = Net2()
print("\n\nExplicit nn.init.uniform_ weights: ")
print(net2.fc1.weight)
print("\n\nEnd Init explore")
if __name__ == "__main__":
main()
The weight values of the two networks are different. If the reset_parameters() statement is un-commented, the weight values are the same.
Is this correct behavior?
(apologies in advance for any etiquette blunders – this is my first post)