RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time

So no you should not set retain_graph everywhere. If this error is not raised, that means that everything is fine. If it is raised, that means that you did something wrong in your code.
The difference I guess is in the way you define your Variables in the script or in command line.

2 Likes

thanks 4 your reply!
Here is my code in command line


How could i fix this bug?

So here the bug in your code is that you try to backprop part of the graph twice. So if you actually want to backprop part of the graph twice, then the first time you call .backward() you should set retain_graph=True. If you were not expecting to backprop part of the graph twice, then that means that your implementation is doing something wrong somewhere because it is actually doing it in practice.

1 Like

@albanD
Really sorry for my ambiguous expression:sweat_smile:
I wonder that why neither in python script and command line nor i set retain_graph to be True
and got different results,the script run as i expect while in the command i find such a error?
What is the difference between the two?In the opnion of mine, they should act the same:thinking:

1 Like

Is that for in every iteration in the loop i got a new graph?So no matter what value the retain_graph is, the script will always act well?

1 Like

In pytorch, every time you perform a computation with Variables, your create a graph, then if you call backward on the last Variable, it will traverse this graph to compute the gradients for everything in it (and delete the graph as it goes through it if retain_graph=False). So in your command line, you created a single graph, trying to backprop twice through it (withour retain_graph) so it will fail.
If now inside your forward loop you do redo the forward computation, then the Variable on which you call backward is not the same, and the graph attached to it is not the same as the previous iteration one. So no error here.

The common mistake (the would raise the mentioned error while you’re not supposed to share graph) that can happen is that you perform some computation just before the loop, and so even though you create new graphs in the loop, they share a common part out of the loop like below:

a = torch.rand(3,3)

# This will be share by both iterations and will make the second backward fail !
b = a * a

for i in range(10):
    d = b * b
    # The first here will work but the second will not !
    d.backward()
22 Likes

Thanks a lot!
Perfect answer!
Wish u a good day!

1 Like

I’m getting this same issue and cant quite figure out where the re-use of an already cleared variable could be, am I missing something obvious?

code:

def train(epoch, model, train_loader, optimizer, criterion, summary, use_gpu=False, log_interval=10):
    correct = 0
    total = 0
    for idx, (x, y) in enumerate(train_loader):
        y = y.squeeze(1)
        x, y = Variable(x), Variable(y)
        x = x.cuda() if use_gpu else x
        y = y.cuda() if use_gpu else y

        preds = model(x)

        loss = criterion(preds, y)
        loss.backward(retain_graph=True)
        optimizer.step()
        optimizer.zero_grad()

        # TODO: turn this part into callbacks
        if idx % log_interval == 0:
            # Log loss
            index = (epoch * len(train_loader)) + idx + 1
            avg_loss = loss.data.mean()
            summary.add_scalar('train/loss', avg_loss, index)

            # Log accuracy
            total += len(x)
            pred_classes = torch.max(preds.data, 1)[1]
            correct += (pred_classes == y.data).sum()
            acc = correct / total
            summary.add_scalar('train/acc', acc, index)

Without retain_graph=True I get the same exception as above

1 Like

Very elegant and interesting example. So if I want to backward more than once without retain_graph, I need to redo the computation from all leaves?

1 Like

Yes, because what a backward without retrain graph is basically a “backward in which you delete the graph as you go along”.

1 Like

What are the ’ intermediary results ’ exactly?
This process is giving me vertigo. What exactly happens that creates and then deletes the ‘results’?

1 Like

Intermediary results are values from the forward pass that are needed to compute the backward pass.
For example, if your forward pass looks like this and you want gradients for the weights.

middle_result = first_part_of_net(inp)
out = middle_result * weights

When computing the gradients, you need the value of middle_result. And so it needs to be stored during the forward pass. This is what I call intermediary results.

These intermediary results are created whenever you perform operations that require some of the forward tensors to compute their backward pass.
To reduce memory usage, during the backward pass, these are deleted as soon as they are not needed anymore (of course if you use this Tensor somewhere else in your code you will still have access to it, but it won’t be stored by the autograd engine anymore).

5 Likes

Thank you.
So this .backward() method is called behind the scenes in Pytorch somewhere inside when the optim.Adam method is run?
I’ve seen a few examples of neural networks with Pytorch but I don’t get where the weights are.

You can check the tutorials on how to train a neural network and what each function is doing.

1 Like

I am not sure how the correlation between retain_graph=True and zero_grad() works. Have a look at this:

(code is adapted from the answer and might not be 100% correct, but I hope you get what I mean)

prediction = self(x)
self.zero_grad()
loss = self.loss(prediction, y)
loss.backward(retain_graph=True) #retains weights --> gradients?
loss.backward() ## add gradients to gradients? makes them a lot stronger?

vs:
loss.backward(retain_graph=True) #retains weights? --> gradients
self.zero_grad()
loss.backward() ## gradients are zero, how is retain_graph=True effective in this case?

or is retain_graph just keeping the weights rather than the gradients? I am a bit confused.

Hi,

retain_graph has nothing to do with gradients. It just allows you to call backward a second time. If you don’t set it in the first .backward() call, you couldn’t call backward a second time.

1 Like

Are you sure you get an error if you don’t use retain_graph = True?

This seems to be the normal protocol of running a model inside iterations, as the model creates a new graph every time for calculating preds.

If you could be more specific about the problem you have faced, it would be very good for me.

Thanks

Hi,

Do I need to wrap up tenors or variables that are not being used in autograd ? Like for example , below is my training code. I have few numpy arrays and lists created to store resutls only. Do I need to wrap them up too ? :

def train(epoch):

trainSeqs = dataloader.train_seqs_KITTI
trajLength = range(dataloader.minFrame_KITTI,dataloader.maxFrame_KITTI, 10)

rn.shuffle(trainSeqs)
rn.shuffle(trajLength)

avgT_Loss=0.0
avgR_Loss=0.0

num_itt=0;

avgRotLoss=[];
avgTrLoss=[];

loss_itt=np.empty([cmd.itterations,2])

for seq in trainSeqs:
	for tl in trajLength:
		# get a random subsequence from 'seq' of length 'fl' : starting index, ending index
		stFrm, enFrm = dataloader.getSubsequence(seq,tl,cmd.dataset)
		# itterate over this subsequence and get the frame data.
		flag=0;
		print(stFrm,enFrm)
		for frm1 in range(stFrm,enFrm):
	
	
			inp,axis,t = dataloader.getPairFrameInfo(frm1,frm1+1,seq,cmd.dataset)

			deepVO.zero_grad()
			# Forward, compute loss and backprop
			output_r, output_t = deepVO.forward(inp,flag)
			loss_r = criterion(output_r,axis)
			loss_t = criterion(output_t,t)
			# Total loss
			loss = loss_r + cmd.scf*loss_t ;
			if frm1 != enFrm-1:
				loss.backward(retain_graph=True)
			else:
				loss.backward(retain_graph=False)
			optimizer.step()


			avgR_Loss = (avgR_Loss*num_itt + loss_r)/(num_itt+1)
			avgT_Loss = (avgT_Loss*num_itt + loss_t)/(num_itt+1)
		
			loss_itt[num_itt,0] = loss_r;
			loss_itt[num_itt,1] = loss_t;

			flag=1;
			num_itt =num_itt+1
			print(num_itt)
			if num_itt == cmd.itterations:
				avgRotLoss.append(np.average(loss_itt[:,0]))
				avgTrLoss.append(np.average(loss_itt[:,1]))
				print(np.average(loss_itt[:,0]) , np.average(loss_itt[:,1]))
				num_itt=0;


				
plt.plot(avgRotLoss,'r')
plt.plot(avgTrLoss,'g')

plt.save('/u/sharmasa/Documents/DeepVO/plots/epoch_' + str(epoch))

@albanD
I still Don’t understand why I call forward() function before loss.backward, the compiler still trigger the error:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

I’m not sure to understand your question.
Maybe open a new thread as this one is quite long already.