How to clear autograd information between iterations when using eval mode?

tueboesen · April 14, 2021, 4:47pm

My code requires the autograd to be on even during eval mode, since I need the gradient information of my output with regards to one of my input variables. However this accumulates gradient information somehow when using eval mode. (In training mode this is not a problem, since it also calls loss.backward() and optimizer.step(), which frees up the memory along with optimizer.zero_grad())

How do I free the memory in a similar fashion but without actually running loss.backward()

Here is a simplified version of my train/eval mode:

def use_model(model,dataloader,train,optimizer,device,batch_size=1):
    aloss = 0.0
    if train:
        model.train()
    else:
        model.eval()
    for i, (Ri, Fi, Ei, zi) in enumerate(dataloader):
        Ri.requires_grad_(True)
        xn, xe, G = getIterData_MD17(Ri.squeeze(), device=device)

        if train:
            optimizer.zero_grad()
        xnOut, xeOut = model(xn, xe, G)

        E_pred = torch.sum(xnOut)
        F_pred = -grad(E_pred, Ri, create_graph=True)[0].requires_grad_(True)
        loss = F.mse_loss(F_pred, Fi)
        if train:
            loss.backward()
            optimizer.step()
        aloss += loss.detach()
    return aloss

ptrblck · April 15, 2021, 8:10am

I don’t fully understand the use case.
Based on this statement:

You are calculating the gradients during validation, while

claims that you are never using the backward operation. How are these gradients calculated?

Note that you could also freeze all parameters, which shouldn’t get gradients and which should thus also save memory.

tueboesen · April 16, 2021, 1:20am

You are right, I am using the backward operation implicitly when I use:

F_pred = -grad(E_pred, Ri, create_graph=True)[0].requires_grad_(True)

I guess my question is how to clear the gradient information after I have invoked this call?
(So basically I need the autograd to compute a derivative through my network at each iteration in the validation/testing step, but I do not want to actually update/change my network parameters, and after each iteration I wish to ensure that I have discarded all the gradient information that the network might have built up.)

Does that make sense?

ptrblck · April 16, 2021, 5:03am

torch.autograd.grad should remove all intermediate tensors after its operation, if retain_graph is not set to True (in the same way tensor.backward() operates).

tueboesen · April 16, 2021, 5:37pm

It really was that simple, thank you.

I just needed to remove the requires_grad_ from F_pred when running in validation mode.