How can the variable computed in the previous epoch be used for computing the loss function in the next epoch?

ddguoll · August 29, 2020, 2:23am

The code is given below. How can the variable computed in the previous epoch be used for
computing the loss function in the next epoch?

for epoch in range((args.start_epoch+1), args.epochs):
for input, target in train_loader:
target = target.cuda()
input=input.cuda()
input_var = torch.autograd.Variable(input)
target_var = torch.autograd.Variable(target)
outputs, feature = model(input_var)
if epoch>0:
l= criterion.forward(feature,target_var, Fea)
Fea=function(model,train_loader)

ptrblck · August 31, 2020, 12:00am

If you want to use feature from the previous iteration, you could store it in another variable so that it won’t be overwritten by the model(input_var) call.

Note that Variables are deprecated since PyTorch 0.4 so you can use tensors in newer versions.
Also, call nn.Modules directly via criterion(feature, ...) instead of the .forward method, as the latter approach won’t call into registered hooks and might yield unexpected behavior.

ddguoll · August 31, 2020, 2:44am

Thank you very much!

ddguoll · September 4, 2020, 3:05am

l= criterion(feature,target_var, Fea).forward()
l.backward(retain_graph=True)
may be the solution.

ddguoll · September 4, 2020, 3:08am

I want use the class feature centers in the loss function. But when I compute the centers,
the GPU memory is not enough. How can I solve it? The code is as follows. Thank you very much!

for epoch in range((args.start_epoch+1), args.epochs):
  Center= computer_Center(model,dataloader, classnum)
	for input, target in train_loader:
        target = target.cuda()
        input = input.cuda()
        input_var = torch.autograd.Variable(input)
        target_var = torch.autograd.Variable(target)
        outputs, feature = model(input_var)
        l = criterion(feature,target_var, Center) .forward()
        l.backward(retain_graph=True)

def computer_Center(model,dataloader, classnum):
    model.train()
    for i in range(classnum):
            j=0
            for input,target in dataloader:
                target=target.cuda()
                input = input.cuda()
                input_var = torch.autograd.Variable(input)
                target=torch.autograd.Variable(target)
                _, feature_ext = model(input_var)
                ind=torch.where(target==i)[0]
                if ind.shape[0]>0:
                    if j==0:
                        feature_mid = feature_ext[ind, :]
                        feature_sum_mid=feature_mid.sum(0)
                    else:
                        feature_mid = feature_ext[ind, :]
                        feature_sum_mid = feature_sum_mid+feature_mid.sum(0)
                    j=j+1

            feature_sum_mid=feature_sum_mid.unsqueeze(0)
            if i==0:
                feature_sum=feature_sum_mid
            else:
                feature_sum=torch.cat([feature_sum,feature_sum_mid],dim=0)

    Center=feature_sum
    for i in range(7):
            Center[i,:]=feature_sum[i,:]/ClaSamNum[i]

return Center

ptrblck · September 4, 2020, 4:56am

In you code snippet you are accumulating the model output feature_ext in feature_sum_mid, which will also store the computation graph (including all intermediate tensors).
If you want to use this Center tensor as a constant target, you should wrap the calculation in compute_Center into with torch.no_grad() to avoid storing the computation graphs and thus lower the memory usage.

Also, Variables are deprecated since PyTorch 0.4, so you can use tensors in newer versions.

ddguoll · September 4, 2020, 6:55am

Thank you very much. I want to compute the loss for the Center. The Center is not a constant target. Now I computer the Center in the batch-wise. But the performance is not good.

ptrblck · September 4, 2020, 9:27am

Even though you are calculating the center manually, does Autograd need to backpropagate through these operations?
Compare it to the target of a classification use case. While the target tensor is of course not a constant value, it’s a constant in the sense that Autograd will use it to calculate the loss, and based on this compute the gradient for the parameters in the model, which created the prediction.

Would the same use case be applied here?

ddguoll · September 5, 2020, 8:00am

I see. Thank you very very very much!