Strange behavior of my neural network training routine

Hello community,

I am working on a simulation with different interacting neural networks. Down below you see a training procedure of this whole system.
Without going into the details of what I am trying to do, I want to know why the term theta_1.data - theta_m.data evaluates to 0 the whole time (see code below). It is supposed to update the parameters theta_m of the model stored in models[-1].
Even if I accept this: If this difference is 0 the whole time, why does theta_m then change during training (I print it out before and after). Something is really going terribly wrong with my Python here. My model( ) is a simple neural network, also posted below.

Can someone see what goes wrong just from this?

Best,
PiF

def func(trainingdata, testdata, model, nrepochs, nrworkers, 
          batchsize, tau, eta, alpha, weight_decay):
    
    ## Create Batch Collections.
    train_batches = torch.utils.data.DataLoader(trainingdata, 
                                                batch_size=batchsize, 
                                                shuffle=True)
    test_batches = torch.utils.data.DataLoader(testdata, 
                                                batch_size=batchsize, 
                                                shuffle=False)
    
    models = [model()] * (nrworkers+1)  # list of worker and master networks.
    
    ## Prepare Output, Loss against Epoch.
    epoch_loss_train = np.empty(nrepochs+1)
    epoch_loss_train[0] = models[-1].test(train_batches)
    epoch_loss_test = np.empty(nrepochs+1)
    epoch_loss_test[0] = models[-1].test(test_batches)
    
    t = np.zeros(nrworkers, dtype=int)  # current timestep for each worker.
    

    # Print some model parameter before training to see whether training process changes it
    #print("Theta_m 0 before training: {}".format(list(models[-1].parameters())[2].data))
   
    ## Training Procedure.
    for epoch in range(nrepochs):

        ## Iterate over Training Set, Batch by Batch.
        for (batchidx, (features, targets)) in enumerate(train_batches):

            random_p = np.random.randint(nrworkers)  # choose random worker.
            t[random_p] += 1

            ## Help Networks.
            worker_copy = copy.deepcopy(models[random_p])
            worker_copy2 = copy.deepcopy(models[random_p])
            
            if (t[random_p] % tau) == 0:  # communication with master.
                ## Iterate over Parameters in Workercopy, Worker, and Master.
                for (theta_1, theta_2, theta_m) in zip(worker_copy.parameters(),
                                                       models[random_p].parameters(),
                                                       models[-1].parameters()):
                    print("DIFF: ",theta_1.data - theta_m.data) # this evaluates to 0 the whole time??
                    theta_2.data -= alpha * (theta_1.data - theta_m.data)  # update worker.
                    theta_m.data += alpha * (theta_1.data - theta_m.data)  # update master.

            
            ## Update Workercopy with ordinary SGD.
            optimizer = optim.SGD(worker_copy.parameters(), lr=eta, weight_decay=weight_decay)
            optimizer.zero_grad()
            loss = F.nll_loss(worker_copy.forward(features), targets)
            loss.backward()
            optimizer.step()
            ## Incorporate the SGD Updates into Worker.
            for (theta_1, theta_2, theta_3) in zip(models[random_p].parameters(),
                                                worker_copy.parameters(),
                                                worker_copy2.parameters()):
                
                ## Add Parameters of modified Workercopy, substract initial
                ## Parameters (otherwise would be counted twice).
                theta_1.data += theta_2.data - theta_3.data 
    
        ## Calculate and store Losses.
        epoch_loss_train[epoch+1] = models[-1].test(train_batches)
        epoch_loss_test[epoch+1] = models[-1].test(test_batches)
    
   #print same model parameter as before to see whether it has changed
   #print("Theta_m 0 after training: {}".format(list(models[-1].parameters())[2].data))
    

   return (epoch_loss_train, epoch_loss_test)
    

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.lin1 = nn.Linear(2, 100)
        self.lin2 = nn.Linear(100, 3)

    def forward(self, x):
        self.train()
        x = self.lin1(x)
        x = torch.relu(x)
        x = self.lin2(x)
        output = F.log_softmax(x, dim=1)
        return output
    
    def test(self, data_loader):  ## returns network output to given input.
        self.eval()
        loss = 0
        
        with torch.no_grad():  ## don't keep track of gradients.
            for (features, targets) in data_loader:  ## iterate over batches.
                output = self.forward(features)  ## get model output.
                
                ## Add Losses of all Samples in Batch.
                loss += F.nll_loss(output, targets, reduction="sum")
        
        loss /= len(data_loader.dataset)  ## average loss.
        
        return loss
                                                                        
                

I guess I have found the issue.
With
models = [model()] * nrworkers
I wanted to create a list of nrworkers independent model() instances. What I created was a list of references all pointing to the same instance of model()…

Now… how can I change it to what I want to have? I want N independent network instances with the same initial parameters. The parameters are to be sampled randomly, but all of the networks should start with the same.

You can initialize nrworkers models first and then load the state_dict of one to all others:

models = [model() for _ in range(nrworkers)]
sd = models[0].state_dict()
for i in range(1, len(models)):
    models[i].load_state_dict(sd)

Thank you, Patrick!

Is there a difference between your models = [model() for _ in range(nrworkers)] and my models = [model()] * nrworkers ? I would think that your code gives the same as mine, a list of references pointing to the same model.

Yes, in your approach you would clone the reference to one model and thus all nrworkers models would refer to the same parameters.
My approach initialized nrworkers different models, which is your desired behavior, if I understand the use case correctly.
Here is a small example:

# reusing the same model
models = [nn.Linear(1, 1)] * 2
print(models[0].weight)
> Parameter containing:
tensor([[-0.1335]], requires_grad=True)

print(models[1].weight)
> Parameter containing:
tensor([[-0.1335]], requires_grad=True)


# manipulate inplace
with torch.no_grad():
    models[0].weight.fill_(0.)

print(models[0].weight)
> Parameter containing:
tensor([[0.]], requires_grad=True)

print(models[1].weight)
> Parameter containing:
tensor([[0.]], requires_grad=True)

# creating separate models
models = [nn.Linear(1, 1) for _ in range(2)]
print(models[0].weight)
> Parameter containing:
tensor([[-0.7255]], requires_grad=True)

print(models[1].weight)
> Parameter containing:
tensor([[0.4155]], requires_grad=True)

# manipulate inplace
with torch.no_grad():
    models[0].weight.fill_(0.)

print(models[0].weight)
> Parameter containing:
tensor([[0.]], requires_grad=True)

print(models[1].weight)
> Parameter containing:
tensor([[0.4155]], requires_grad=True)

As you can see, the parameters will be the same in your approach and also inplace manipulation are reflected on all copies of the model, while this is not the case using my approach.

1 Like