Computational graph is only built once

zzzf · March 6, 2021, 9:04pm

I’m using PyTorch for auto-grad without any neural network involved.

I’m unable to provide a reproducible code otherwise it will be too many lines. From a high level, it looks like the following:

class MarkovModel(nn.module):
    def __init__(self):
        super(MarkovModel, self).__init__()
        self.potentials = nn.ParameterDict()

    def set_potantials(self, potentials):
        ...

class BeliefProb():
    def __init__(self):
        self.model = MarkovModel()
        self.belifs = ...
    def update_beliefs(self):
        ...
        self.beliefs = ...
        ...
    def inference(self):
        for i in range(SOME_CONSTANT):
            self.update_beliefs()

iters = ANOTHER_CONSTANT
model = MarkovModel()
potentials = ...
model.set_potentials(potentials)
bp = BeliefProb(model)
optimizer = Adam(bp.model.parameters(), lr=0.01)
target = ...
for i in range(iters):
    optimizer.zero_grad()
    bp.inference()
    loss = torch.abs(target - inference.beliefs).sum()
    loss.backward()
    optimizer.step()

If I set iters = 1, then it works fine. But if iters>1, I got the complains that

RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

It seems that calling bp.inference() can build the computational graph once, but when I called bp.inference() again, the computational graph will not be built for the second time.

crcrpar · March 7, 2021, 1:59am

I think avoiding multiple calls of backward() resolves.
For example, initialize loss with None and accumulating loss calculated in each iteration and call backward after the for loop like

loss = None
for _ range(iters):
    bp.inference()
    cur_loss = torch.abs(target - inference.beliefs).sum()
    if loss is None: loss = cur_loss
    else: loss += cur_loss
loss.backward()

Toy example is computational-graph-is-only-built-once.ipynb · GitHub

zzzf · March 7, 2021, 9:56am

Thanks for your reply. I think I forgot to add the optimizer in the code snippet. Please see the last block where I add the optimizer.step() during each iteration. Therefore, I can’t call loss.backward() after the loop.

crcrpar · March 7, 2021, 1:18pm

The updated code snippet looks basically fine.

albanD · March 8, 2021, 4:01pm

Hi,

What this error means is that some part of the graph is shared between iteration. This is most likely due to the fact that you pre-compute something outside of the for loop and re-use it at each iteration inside.
If you want gradients flowing throw these computations, you should move them inside the for-loop.
If you don’t want gradients flowing, you should .detach() the result of the pre-computations so that gradients won’t flow back there.

zzzf · March 18, 2021, 7:58pm

Thanks for the reply! You are right, after I move pre-computation inside the loop, gradients propagate smoothly.