RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time. Help appreciated!

I am trying to calculate the mutual information between the hidden layers’ output and input and output using the following code:

def InfoNCE(X, Y, batch_size=256, num_epochs=200, dev=torch.device(“cpu”), model=None, rg=True):
A = torch.tensor([float(batch_size)] * batch_size).reshape(batch_size, 1)#.cuda()
if not model:
model = nn.Sequential(
nn.Linear(X.shape[1]+Y.shape[1], 16),
nn.ReLU(),
nn.Linear(16, 8),
nn.ReLU(),
nn.Linear(8, 1),
)

# Move data to device
X     = X.to(dev)
Y     = Y.to(dev) + torch.randn_like(Y) * 1e-4
model = model.to(dev)

opt   = optim.SGD(model.parameters(), lr=0.03, momentum=0.9)
td    = TensorDataset(X, Y)

result = []  
for epoch in range(num_epochs):            
    for x, y in DataLoader(td, batch_size, shuffle=True, drop_last=True):            
        opt.zero_grad()
        
        top    = model(torch.cat([x, y], 1)).flatten()
        xiyj   = torch.cat([x.repeat_interleave(batch_size,dim=0),y.repeat(batch_size,1)], 1)    
        bottom = torch.logsumexp(model(xiyj).reshape(batch_size,batch_size), 1) - A.log()
        
        loss   = -(top - bottom).mean()
        
        result.append(-loss.item())
        
        loss.backward(retain_graph=rg)
        opt.step()
        r = torch.tensor(result[-20:]).mean()
#plt.plot(result)
print(r)              
return r

InfoNCE(dataset.x, layer_2_log[1])

I tried setting retain_graph both true and false after reading some posts, it still gives the runtime error no matter what.

Any help would be appreciated!

Hi,

Which version of pytorch are you using? Some autograd fix have been added to master last week that might be related. Do you see the same behavior if you use nightly builds?
If you still do, could you write a small script that I could run to reproduce the error?

Thank you so much for answering!

  1. 3.6.9
  2. How to check if I have nightly build? I am running in an environment that I created using conda.
  3. https://drive.google.com/file/d/1MvO3DYE_W0pJfcnBb7Yosfp1M2943lil/view?usp=sharing

Hi, you can check your version by printing torch.__version__ and report it here.

For your code sample, what are the size of the objects in the dataset? Could you replace the p.load()with just a big torch.rand(your_size) so that I can run it without your dataset?

  1. 1.2.0
  2. Sorry that I forgot to share the dataset. Here you go: https://drive.google.com/file/d/1lMHCbVqdPFI21hsy8Rqd_oB1nHVdEsn5/view?usp=sharing
  1. Can you try and install the pytorch preview build from this page and check if it still happens?
  2. thanks, I asked for random inputs because that way we don’t have to download dataset and move them around on our machines to reproduce :wink: and it’s much simpler for us.
  1. My bad. The system is not allowing me to reply with a share line cuz they think it is promotional. Can you comment line 10 and 11 and change line 13 and 14 to:

x_train = torch.rand([5198,22])#d[‘mushrooms’][0][0:5198]
y_train = torch.rand([5198,1])#d[‘mushrooms’][1][0:5198].view(5198,-1)

  1. I tried the preview. It still happens.

Ho, looking at the code in a nice editor made me realise that you pass layer_2_log to your second training stage. But that Tensor has some history already from the first training stage. Is that expected that this will backward in the first part of the model as well? If no, you should add a .detach() (when saving it in the first training loop or when passing it to the InfoNCE function) not to keep it’s history.

I see what you are talking about! Thank you so much!