Models give same output despite having different input

I’m somewhat new to PyTorch, so apologies if I’m missing something obvious here.

I am trying to create two models using the same neural network template, but with different initial inputs to ensure they diverge. With my code, however, I find the state_dicts of either model are still the same after the initialization step.

Parts of my code are abbreviated, but I can elaborate further if needed.

#Creating models
model1 = CrystalGraphConvNet(orig_atom_fea_len, nbr_fea_len, ...)
model2 = CrystalGraphConvNet(orig_atom_fea_len, nbr_fea_len, ...)

optimizer1 = optim.SGD(model1.parameters(), args.lr,
                      momentum=args.momentum,
                      weight_decay=args.weight_decay)
optimizer2 = optim.SGD(model2.parameters(), args.lr,
                momentum=args.momentum,
                weight_decay=args.weight_decay)

def train(train_loader, epoch):
#Input1, target1 and input2, target2 come from zipping pairs of train_loader rows

    init_input1 = (Variable(input1[0]), Variable(input1[1]),input1[2], input1[3])
    init_input2 = (Variable(input2[0]),Variable(input2[1]), input2[2],input2[3])
  
    target_normed1 = normalizer.norm(target1)
    target_normed2 = normalizer.norm(target2)
    init_target1 = Variable(target_normed1)
    init_target2= Variable(target_normed2)

#Initialization Step
    optimizer1.zero_grad()
    output1 = model1(*init_input1)
    loss1 = criterion(output1, init_target1)
    loss1.backward()
    optimizer1.step()
    init_state1 = model1.state_dict()


    optimizer2.zero_grad()
    output2 = model2(*init_input2)
    loss2 = criterion(output2, init_target2)
    loss2.backward()
    optimizer2.step()
    init_state2 = model2.state_dict()

Despite doing optimizer.step() to make the gradient changes permanent, init_state1 and init_state2 are the same afterwards.

I’ve tested this with random pairs of initial inputs, so there shouldn’t an issue with how the inputs are paired. I’ve also tried “model2 = copy.deepcopy(model1)” to no avail, so I don’t think there’s an aliasing issue either.

I think we’d need more info. Some things to consider: What’s your learning rate and criterion? Are you sure input1 and input2 differ? how did you initialize weights of your models? Are the outputs of your models finite (not NaN and not inf) and reasonable to learn from (e.g. not just zero)? Maybe add some asserts, e.g. I would think if your inputs are random then your loss values should also differ too, which you can validate before comparing the state_dicts.

FYI you shouldn’t need Variable (IIRC I think it’s deprecated).

Q. when are you comparing the values of init_state1 and init_state2? If it’s after everything is done, it’s possible that you are comparing the same tensors, since the state dict will point to tensors that changed after the optimizer step, and you’ll always see the same model parameters.

I’ve looked into it further, and it seems like the actual losses and outputs after the initialization step are different. Running a boolean check or assert has thrown errors for me, and I think my eye test was just bad with comparing the very large init_state_dicts.

For example, the first tensor in the first state dict was (‘embedding.weight’, tensor([[ 0.0645, -0.0322, -0.0537, …, -0.0514, 0.0883, -0.0516], and the first tensor in the second state dict was (‘embedding.weight’, tensor([[ 0.0645, -0.0318, -0.0537, …, -0.0514, 0.0883, -0.0516]. Sorry for the oversight there.

I also had a similar issue later on in my code, but I was able to solve it just by typing out the problem; I’ll leave my problem and solution here in case anyone finds it useful.

After my initialization step is done, I feed both models a pair of inputs to create 4 outputs within a for loop step i.e model1(input1), model1(input2), model2(input1), model2(input2). I then calculate the losses and backpropagate them, but I do not do an optimizer.step() to make the model changes permanent. Instead, I save the current state dicts into pth.tar files for later comparison by a comparison function.

Unfortunately, I get an issue where outputs and state_dicts descendant from the same model are the same, despite corresponding to different inputs e.g. output1 = output3 and output2 = output4

optimizer1.zero_grad()
output1 = model1(*input_var1)
loss1 = criterion(output1, target_var1)
loss1.backward()
state1 = model1.state_dict()

optimizer1.zero_grad()
output3 = model1(*input_var2)
loss3 = criterion(output3, target_var2)
loss3.backward()
state3 = model1.state_dict()

#And so on for output2 = model2(input1) and output4 = model2(input2)
#....
state_files = ["eval1.pth.tar", "eval2.pth.tar","eval3.pth.tar", "eval4.pth.tar"]
states = [state1, state2, state3, state4]
for index, state in enumerate(states):
  torch.save(state, state_files[index])

The issue was that since state1 and state2 were pointers and not copies of model.state_dict(), they were getting updated when output3 and output4 were calculated(somewhat similar to what @dhruvbird described).

Thus, I was getting state1 = state3 and state2=state4 in my calculation. I fixed this by added torch.save() after each model evaluation, instead of doing the saving at the end of all the evaluations.

I appreciate everyone pointing me in the right direction on this.