Iterative Loss: Loss from first iteration when added in the second iteration does not work

mbsyed · April 21, 2021, 4:49pm

Hi,
I am working on implementing an iterative loss for a project I am working on.
Basically what I am trying to do is the following.

model = toy_net().to(device)
opt = optim.Adam(model.parameters()) #optimizer
epochs = 50                          #epochs
k = 3                                #iterations
loss_fn = nn.BCELoss()

for epoch in range(epochs):
    for inputs, labels in dataloader:
        losses_iter = [0]*3
        x,y = inputs.to(device), labels.to(device)
        for i in range(k):
            Z = ((i+1)*(i+2))/2
            out = model(x)            #get model output
            loss = loss_fn(out,y)   #get loss
            losses_iter[i] = (loss)*(i+1)
            total_loss = sum((loss_iter)/Z) #total loss is summing losses
                                                   #and scaling by Z            
            total_loss.backward()
            opt.step()
            losses_iter[i] = (loss.detach())*(i+1)

The Problem I am having right now is that once I detach the loss from the first iteration and add it in the second iteration it is just a constant and is not taken into account for the second iteration since the derivative is of the first loss which is a constant now is 0.
Any help would be appreciated

ptrblck · April 23, 2021, 5:59am

That is the expected behavior of detached tensors. If you want to add the loss into account in the second iteration, you would have to keep it attached to the computation graph and might need to set retain_graph=True in the backward operation.

mbsyed · April 23, 2021, 12:43pm

Thank you for the response. That works but only if I do not update the weights using opt.step(). In my use case I need to update the weights before I run the next iteration.
Basically, I am working on a segmentation problem. My input are images with 4 channels (RGB and a fourth channel which is initialized to zeros). After the first iteration the 4th channel (zeros) is replaced with outputs from the updated model in the first iteration. Then this new input is fed into the same model again and we calculate the loss for the current iteration and add the loss from the last iteration. It would essentially look something like the following.

model = Net()
x         = (samples,4,height,width)
y         = (samples,1,height,width)
opt      = Adam(model.parameters())

#-------- iteration 1 -----------
opt.zero_grad()
outputs1 = model(x)
loss1    = loss_fn(outputs1,y)
loss = (loss1)*1
loss.backward(retain_graph=True)
opt.step()
x[samples,3,:,:] = model(inputs).detach()          #inputs updated for iteration 2

#-------- iteration 2 -----------
opt.zero_grad()
outputs2 = model(x)
loss2      = loss_fn(outputs2,y)
loss        = (2/3)*(loss2) + (1/3)*(loss1)           #loss weighted from previous iter and current iter
loss.backward(retain_graph=True)                      #This gives an error:  one of the variables needed for 
                                                       gradient computation has been modified by an inplace operation. 
                                                       #If I don't do opt.step() then there is no error
opt.step()
x[samples,3,:,:] = model(inputs).detach()          #inputs updated for iteration 3

#-------- iteration 3 -----------
opt.zero_grad()
outputs3 = model(x)
loss3      = loss_fn(outputs3,y)
loss        = (1/2)*(loss3) + (1/3)*(loss2) + (1/6)*(loss1)           #loss weighted from previous iters and current iter
loss.backward()             
opt.step()

I think the main issue I am having now is that I need to update the model to update my input for the next iteration using the outputs from the updated model. But when I update the model and try to do a backprop for the second iteration whose loss is the added loss of iteration 1 and iteration 2 even with retain_graph=True I get an error. I am not sure how to get around this and what is causing this issue.

ptrblck · April 23, 2021, 7:24pm

I guess you are seeing an error, since you are trying to calculate the gradients (in iter2) from stale forward activations (calculated in iter1).
Have a look at this post for more information and check, if you are hitting the same error.

mbsyed · April 23, 2021, 7:48pm

It sort of fits the error I am getting the only difference being I am calculating a different output for iteration 2 and a different loss function but the backward is done on the added loss where loss=loss_iter1+loss_iter2. Is there a way I can fix this?

ptrblck · April 23, 2021, 9:29pm

This would explain the error, since loss_iter1 would still reference the old computation graph.

It depends on your actual use case, since you are using stale forward activations to calculate the parameter updates, which is wrong.
From the linked issue:

loss_iter1 was calculated using the first forward pass and the model with parameter_set_0
this forward pass also calculated all intermediate forward activations (fwd_set_0) and stored them, which are needed to compute the gradients
you are updating the model to parameter_set_1, all forward activations are now stale, since they were not calculated by parameter_set_1
loss_iter1.backward() tries to compute the gradients using fwd_set_0 and parameter_set_1, which is wrong and fails

You could either delay the optimizer.step() call or recompute the forward activations depending on your use case.