Hi everyone,
I’m new to PyTorch/ML, so I hope I’m missing something simple. I am trying to create a multi-output neural network for regression, starting with a network with just a simple linear layer from the inputs to the outputs, essentially to see how everything is working. The dimensionality of the model is 8 input features and 4 outputs.
I eventually got into the weeds with this approach, trying to show that if I train two identical models with that architecture, that the weights and biases are the same. I want to track down the sources of randomness within PyTorch so I can fine-tune them once I get to bigger models.
The code essentially goes like this:
torch.manual_seed(0)
#create dataloader from a dataset. It is 1000 samples with 8 features and 4 outputs for each sample.
dataloader = DataLoader(myDataset, batch_size = 500, sampler=utils.data.SequentialSampler(dataset))
#Initialize networks
#I just noticed I don't send the networks to the 'device', but I'm on CPU for this test.
NN1 = myNeuralNet(IN=8,OUT=4)
NN2 = myNeuralNet(IN=8,OUT=4)
#there is only one Linear layer in each model with a weight_norm on the outside, so each model should only require one initialization. In other words: nn.weight_norm(nn.Linear(IN,OUT)) is the network.
#There are no activation functions.
nn.init.ones_(NN1.weight)
nn.init.ones_(NN1.bias)
nn.init.ones_(NN2.weight)
nn.init.ones_(NN2.bias)
#I don't think I need an independent loss fcn for each one, but I'm just being safe
loss_fn1 = nn.MSELoss(reduction='sum')
loss_fn2 = nn.MSELoss(reduction='sum')
optimizer1 = optim.Adam(NN1.parameters())
optimizer2 = optim.Adam(NN2.parameters())
#train both models (defined below)
train_epoch(NN1,dataloader,loss_fn1,optimizer1)
train_epoch(NN2,dataloader,loss_fn2,optimizer2)
#after that single step, the networks' weights and biases will not be identical, but they are not
def train_epoch(model,dataloader,loss_fn,optimizer):
model.train()
for batch_num, (X,y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
y_hat = model.forward(X)
loss = loss_fn(y_hat,y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
return(None)
#Finally, definition of network setup and forward:
class myNeuralNet(nn.Module):
def __init__(self,IN,OUT):
self.flatten = nn.Flatten()
self.layers = nn.Sequential(weight_norm(nn.Linear(IN,OUT,bias=True)))
def forward(self,x):
x = self.flatten(x)
y = self.layers(x)
return(y)
And that’s the meat and potatoes of what’s happening. The two networks are barebones, and should be the same, but they aren’t. They do eventually converge within 1e-6 of each other, but I’m confused as to why they aren’t exactly the same epoch for epoch. I would assume that two models with the same data, the same initial weights/biases, the same optimizers, and the same loss functions would change their weights and biases same, but they don’t.
What am I misunderstanding?
I also want to add that I perfectly understand that models with a very small difference like this will probably perform the same, but I want to understand the inner workings of PyTorch more before I start making those assumptions. I also understand that a Neural Network without activation functions or hidden layers is probably next to useless, but that is also for the same reason as above.
Thanks in advance!