RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time

ForeverZH0204 · February 28, 2019, 2:05pm

Perhaps i got it this time

Is it for the reason:
Only the tensor with requires_grad = True has the attribute grad_fn for only grad_fn of these tensors are meaningful.?
If this is right, i think the description in the official tutorial:

Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a .grad_fn attribute that references a Function that has created the Tensor (except for Tensors created by the user - their grad_fn is None ).

is some kind of confusing.

Or maybe the attribute requires_grad of the data we get from dataloaders are False? Or we always see the grad_fn of the data we get from dataloaders as None no matter what their requires_grad are?

Weifeng · March 1, 2019, 6:30am

There’s no problem in the tutorial.
.grad_fn is set for backpropgating through this tensor, so only node that is created by some operation and with requires_grad = True (which is also the definition of non-leaf tensor) has non-None grad_fn.
Hope things are clear this time.

Linsen_Song · April 1, 2019, 5:05am

Nice example. Thx, that helps me a lot.

Dishank_Bansal · April 12, 2020, 12:32am

>>> a = torch.tensor([2.0,3.0], requires_grad=True)
>>> b = 5.0
>>> c = a+b
>>> d = c.sum()
>>> d.backward()
>>> d.backward()

In this, I am doing backward() multiple times but I am not getting any retain_graph error. Why am I not needed to have retain_graph=True here?

On the other hand when I am doing:

>>> >>> a = torch.tensor([2.0,3.0], requires_grad=True)
>>> b = 5.0
>>> c = a+b
>>> d = 2*c
>>> e = d.sum()
>>> e.backward()
>>> e.backward()

I am getting the apropriate error:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

ptrblck · April 12, 2020, 7:51am

@albanD already explained this edge case in this thread here.
Your first example only uses an addition, which doesn’t need intermediate tensors to calculate the gradients.

Olga_Mishin · June 12, 2020, 3:55pm

I have the same error even in the following code:

    output1 = torch.rand((3, 5));
    output1.requires_grad = True;

    output1 = output1 * 20;
    y1 = torch.ones(3, dtype=int);

    lossfn = torch.nn.CrossEntropyLoss()
    loss = lossfn(output1, y1);

    loss.backward();

Where am I doing the first “backward” here?

ptrblck · June 13, 2020, 7:10am

Your code snippet works fine on my machine.
Which PyTorch version are you using?

Kimonili · September 27, 2020, 2:57pm

I have understood the error from @albanD explanation, thank you for that! How is the code snippet below backwards the same graph a second time?

# backpropagate adversary network
self.critic_adversaries.zero_grad()
critic_adversaries_loss.backward()
nn.utils.clip_grad_norm_(self.critic_adversaries.parameters(), 1.0)
self.critic_optimizers[0].step()
# backpropagate agent network
self.critic_agents.zero_grad()
critic_agents_loss.backward() #THE ERROR IS THROWN HERE#
nn.utils.clip_grad_norm_(self.critic_agents.parameters(), 1.0)
self.critic_optimizers[1].step()

self.critic_adversaries is one network and self.critic_agents is another network. self. critic_optimizers is a list of Adam optimizer objects that the first element applies to self.critic_adversaries network and the second one ot self.critic_agents network. The only common thing they have is that they are of the same CriticNet class.

Why do I get the error in the title of the thread? These two networks have nothing to do with one another.

Kimonili · September 28, 2020, 11:17am

Hi @albanD and @ptrblck, could you assist me with this?

zyh3826 · September 29, 2020, 3:53am

I got the same mistake, but don’t know how to fix it, could you help me, thanks, @albanD

`def train(config, model, train_iter, dev_iter, test_iter):

start_time = time.time()
model.train()
optimizer = torch.optim.Adadelta(model.parameters(),
                                 lr=config.learning_rate)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
total_batch = 1  
dev_best_loss = float('inf')
last_improve = 0 
flag = False 
writer = SummaryWriter(log_dir=config.log_path + '/' +
                       time.strftime('%m-%d_%H.%M', time.localtime()))
for epoch in range(config.num_epochs):
    print('Epoch [{}/{}]'.format(epoch + 1, config.num_epochs))
    for _, trains in enumerate(train_iter):
        model.zero_grad()
        left_trains, right_trains, mid_data = trains
        mid_trains, labels = mid_data
        labels = labels.cuda(config.cuda_id)
        outputs = model(left_trains.cuda(config.cuda_id),
                        right_trains.cuda(config.cuda_id),
                        mid_trains.cuda(config.cuda_id))
        loss = F.cross_entropy(outputs, labels)
        loss.backward()
        optimizer.step()
        if total_batch % 10 == 0:
            true = labels.data.cpu()
            predic = torch.max(outputs.data, 1)[1].cpu()
            train_acc = metrics.accuracy_score(true, predic)
            dev_acc, dev_loss = evaluate(config, model, dev_iter)
            if dev_loss < dev_best_loss:
                dev_best_loss = dev_loss
                torch.save(model.state_dict(), config.save_path)
                improve = '*'
                last_improve = total_batch
            else:
                improve = ''
            time_dif = get_time_dif(start_time)
            msg = 'Iter: {0:>6},  Train Loss: {1:>5.2},  Train Acc: {2:>6.2%},  Val Loss: {3:>5.2},  Val Acc: {4:>6.2%},  Time: {5} {6}'
            print(
                msg.format(total_batch, loss.item(), train_acc, dev_loss,
                           dev_acc, time_dif, improve))
            writer.add_scalar("loss/train", loss.item(), total_batch)
            writer.add_scalar("loss/dev", dev_loss, total_batch)
            writer.add_scalar("acc/train", train_acc, total_batch)
            writer.add_scalar("acc/dev", dev_acc, total_batch)
            model.train()
        total_batch += 1
        if total_batch - last_improve > config.require_improvement:
            print("No optimization for a long time, auto-stopping...")
            flag = True
            break
    if flag:
        break
    scheduler.step() 
writer.close()
test(config, model, test_iter)`

zyh3826 · September 29, 2020, 3:56am

here is my model

class LR_CNN(nn.Module):
    """
    Input shape:
        4D tensor with shape: (batch, K, pad_size, embed_dim)
     Output shape
        3D tensor with shape: (batch, K, len(filter_sizes)*num_filters)
    """
    def __init__(self, config) -> None:
        super(LR_CNN, self).__init__()
        filter_sizes = [int(i) for i in config.filter_sizes.split()]
        self.conv = nn.ModuleList([
            nn.Conv3d(1, config.num_filters, (1, i, config.embed))
            for i in filter_sizes
        ])
        self.relu = nn.ReLU()

    def conv_and_activate(self, x, conv):
        out = conv(x).squeeze(-1)
        out = F.max_pool2d(out, (1, out.size(3)))
        out = self.relu(out)
        out = out.squeeze(-1)
        out = out.permute(0, 2, 1)
        return out

    def forward(self, x):
        x = x.unsqueeze(1)
        out = torch.cat(
            [self.conv_and_activate(x, conv) for conv in self.conv], dim=2)
        return out


class Mid_CNN(nn.Module):
    """
    Input shape:
        3D tensor with shape: (batch, pad_size, embed_dim)
     Output shape
        2D tensor with shape: (batch, len(filter_sizes)*num_filters)
    """
    def __init__(self, config) -> None:
        super(Mid_CNN, self).__init__()
        filter_sizes = [int(i) for i in config.filter_sizes.split()]
        self.conv = nn.ModuleList([
            nn.Conv2d(1, config.num_filters, (i, config.embed))
            for i in filter_sizes
        ])
        self.relu = nn.ReLU()
        self.fc = nn.Linear(
            len(filter_sizes) * config.num_filters,
            len(filter_sizes) * config.num_filters)

    def conv_and_activate(self, x, conv):
        out = conv(x).squeeze(-1)
        out = F.max_pool1d(out, out.size(2))
        out = self.relu(out)
        return out.squeeze(-1)

    def forward(self, x):
        out = torch.cat(
            [self.conv_and_activate(x.unsqueeze(1), conv) for conv in self.conv], dim=1)
        out = self.fc(out)
        out = self.relu(out)
        return out.unsqueeze(1)


class LSTM(nn.Module):
    """
    Input shape:
        3D tensor with shape: (batch, K, len(filter_sizes)*num_filters)
     Output shape
        3D tensor with shape: (batch, K, num_directions * hidden_size)
    """
    def __init__(self, config) -> None:
        super(LSTM, self).__init__()
        # self.config = config
        input_size = len(config.filter_sizes.split()) * config.num_filters
        self.lstm = nn.LSTM(input_size,
                            config.hidden_size,
                            dropout=config.drop_out,
                            bidirectional=True,
                            batch_first=True,
                            num_layers=config.num_layers)

    def forward(self, x):
        if len(x.size()) == 2:
            x = x.unsqueeze(1)
        out, _ = self.lstm(x)
        return out


class Attention(nn.Module):
    """
    Input shape:
        3D tensor with shape: (batch, K, features)
     Output shape
        2D tensor with shape: (batch, features)
    """
    def __init__(self, config) -> None:
        super(Attention, self).__init__()
        self.w = nn.Parameter(torch.Tensor(config.hidden_size * 2).cuda(config.cuda_id))
        self.b = nn.Parameter(torch.Tensor(config.K).cuda(config.cuda_id))
        self.u = nn.Parameter(torch.Tensor(config.K, config.K).cuda(config.cuda_id))
        self._creat_weight()

    def _creat_weight(self, mean=0.0, std=0.05):
        self.w.data.normal_(mean, std)
        self.b.data.normal_(mean, std)
        self.u.data.normal_(mean, std)

    def forward(self, x):
        uit = torch.matmul(x, self.w)
        temp = uit
        temp += self.b
        uit = temp
        uit = torch.matmul(uit, self.u)
        uit = torch.tanh(uit)
        uit = torch.exp(uit)
        ait = torch.sum(uit, dim=1).unsqueeze(1)
        uit = torch.div(uit, ait).unsqueeze(2)
        res = x * uit
        return torch.sum(res, dim=1)


class CBA(nn.Module):

    def __init__(self, config) -> None:
        super(CBA, self).__init__()
        self.lr_cnn = LR_CNN(config)
        self.mid_cnn = Mid_CNN(config)
        self.lr_lstm = LSTM(config)
        self.mid_lstm = LSTM(config)
        self.lr_attention = Attention(config)
        self.fc = nn.Linear(2 * config.hidden_size, 2 * config.hidden_size)
        self.fc1 = nn.Linear(6 * config.hidden_size, 2)
        self.relu = nn.ReLU()

    def forward(self, left, right, mid):
        l_out = self.lr_cnn(left)
        r_out = self.lr_cnn(right)
        mid_out = self.mid_cnn(mid)
        l_out = self.lr_lstm(l_out)
        r_out = self.lr_lstm(r_out)
        mid_out = self.mid_lstm(mid_out)
        mid_out = self.fc(mid_out)
        mid_out = self.relu(mid_out).squeeze(1)
        l_out = self.lr_attention(l_out)
        r_out = self.lr_attention(r_out)
        out = torch.cat((torch.cat((l_out, mid_out), dim=1), r_out), dim=1)
        out = self.fc1(out)
        out = F.softmax(out, dim=-1)
        return out

saba · January 27, 2021, 6:45am

Hi Ptrblck,

sorry to tak eyour time. I am running GAN. To do backward(), when I applied it for discriminator no error, but when I want to apply backward() for generator the error arise (RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.)
The code is as follow:


def BorderLoss(pred):
     mask = torch.ones_like(pred).bool()
     mask[:,1:-1,1:-1,1:-1] = 0
     Border = pred[mask] - 1
     return torch.mean(torch.abs(Border))

    for Epoch in range(args.Epochs):

        for reals,background in zip(train_loader_reals,train_loader_background):
            reals,background = reals.to(device),background.to(device)
            realsLabel = torch.ones(reals.shape[0]).to(device)
            fakesLabel = torch.zeros(reals.shape[0]).to(device)
            
            z = torch.randn(reals.shape[0], nz,1, 1, 1).to(device)
            Mask = G(z,scalingG)
     
            fakes = Mask*background

            # Discriminator step
            D.zero_grad()

            realsPrediction = D(reals)
            fakesPrediction = D(fakes)

            lossOnReals = criterionDiscriminator(realsPrediction,realsLabel)

            lossOnFakes = criterionDiscriminator(fakesPrediction,fakesLabel)

            ## -- discriminator whole Loss--------------
            lossD = (1-β_border)*(lossOnReals + lossOnFakes)
 
            ## ---backward ----------
            lossD.backward()

            if iteUpdate % updateDiscrim == 0 :
                optimD.step()
                
        ## -----------Start train generator ----------------
            G.zero_grad()

            BL = BorderLoss(CMB)
            BCE_G = criterionDiscriminator(D(fakes),realsLabel)
            lossG =( (1-β_border)*BCE_G) + (β_border*BL)

            lossG.backward()
            optimG.step()

if I put lossD.backward(return_graph=True), does it have a bad effect on training? where is the issue? before I have no error.

ptrblck · January 27, 2021, 7:11am

The error is created, because you are passing fakes to the discriminator twice.
The first time when you are training D, the second time when you are training G.
Since you don’t want to train G and D together in the first part of the code, you should pass the detached fake samples to D:

            Mask = G(z,scalingG)
     
            fakes = Mask*background

            # Discriminator step
            D.zero_grad()

            realsPrediction = D(reals)
            fakesPrediction = D(fakes.detach())

            lossOnReals = criterionDiscriminator(realsPrediction,realsLabel)

            lossOnFakes = criterionDiscriminator(fakesPrediction,fakesLabel)

Kevin_Shah · March 5, 2021, 2:33am

import torch
import math

class LegendrePolynomial3(torch.autograd.Function):
def forward(ctx,input):
“”"
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
“”"
ctx.save_for_backward(input)
return 0.5 * (5 * input ** 3 - 3 * input)
def backward(ctx,grad_output):
“”"
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
“”"
retain_graph=True
input = ctx.saved_tensors
return grad_output * 1.5 * (5 * input ** 2 - 1)
dtype = torch.float
device = torch.device(‘cpu’)

#inputs
x = torch.linspace(-math.pi,math.pi,2000,device = device ,dtype=dtype)
y =torch.sin(x)

#weights
a = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
b = torch.full((), -1.0, device=device, dtype=dtype, requires_grad=True)
c = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
d = torch.full((), 0.3, device=device, dtype=dtype, requires_grad=True)

learning_rate = 5e-6

for t in range(2000):
P3 = LegendrePolynomial3.apply
# Forward pass: compute predicted y using operations; we compute
# P3 using our custom autograd operation.
y_pred = a + b * P3(c + d * x)

#loss
loss.backward(retain_graph=True)


#updateweights

with torch.no_grad():
    a -= learning_rate * a.grad
    b -= learning_rate * b.grad
    c -= learning_rate * c.grad
    d -= learning_rate * d.grad
    
    # Manually zero the gradients after updating weights
    a.grad = None
    b.grad = None
    c.grad = None
    d.grad = None

print(f’Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3’)

RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

I event removed and put retain_graph = True
what do I do?

ptrblck · March 5, 2021, 6:30am

Your code doesn’t show the loss calculation, so I guess you might be reusing it from a previous code snippet.

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier.

Kevin_Shah · March 5, 2021, 6:54pm

No I am calculating it use loss.backward.
This is from official pytorch doc I am re-implimenting

ptrblck · March 6, 2021, 5:40am

Your code snippet doesn’t show the loss calculation.
Could you post an executable code snippet or link to the tutorial, which raises this issue?

Jehanzeb_Mirza · April 16, 2021, 1:59pm

Hi,
I have been reading through this thread and has been pretty informative but there is something which I cannot figure out in my case. I am calling loss.backward() only once. Here is my code:

model.train()
optimizer.zero_grad()
loss, tb_dict, disp_dict = model_func(model, batch)
loss_u, _, _ = model_func(model, u_batch)
loss = loss+loss_u
loss.backward()

Still I receive an error that I need to set retain_graph=True.

Just a little bit of background. The model used is same. batch and u_batch are different, as I am having different dataloaders for them. Now, both of these batches have different flags in them and they calculate different losses depending upon those flags.

The question is: As I am summing up the losses, and calling only one time backward(), why does it say that I need to retain graph?

Would be really helpful if anyone can help!

Best Regards.

Jingles · April 23, 2021, 10:29am

Hi community,
I can’t figure out what is wrong with my code. Likewise, getting the error at line loss.backward():

RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.

Code

model = MyModel() # a nn.Module
model = model.to(config["model"]["device"])

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=config["model"]["learning_rate"])

def run_epoch(dataloader, is_training=False):
    epoch_loss = 0

    if is_training:
        model.train()
    else:
        model.eval()

    for idx, (x, y) in enumerate(dataloader):
        if is_training:
            print("is_training")
            model.zero_grad()

        x = x.to(config["model"]["device"])
        y = y.to(config["model"]["device"])

        out = model(x)
        loss = criterion(out.contiguous(), y.contiguous())

        if is_training:
            loss.backward() # <-- RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.
            optimizer.step()

        epoch_loss += loss.detach().item()
        print(epoch_loss)

    return epoch_loss

for epoch in range(config["model"]["num_epoch"]):
    loss_train = run_epoch(train_dataloader, is_training=True)
    loss_val = run_epoch(val_dataloader)

Output:

is_training
77.78125
is_training
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.

ptrblck · April 24, 2021, 12:41am

I don’t see any obvious issues in your code. Could you add the missing definitions, so that we could execute it and reproduce the error?