Trying to backward through the graph a second time, but the buffers have already been freed


(biggie bigs) #1

Hi,

I am trying to do a classification problem using NNs,but I get the following error when calling backward() :
Trying to backward through the graph a second time, but the buffers have already been freed
I can’t understand what s wrong with my computation graph since I am only passing through my model and then computing the loss.

def forward(self, x):
   output = torch.softmax(self.out.forward(x), dim=-1)
   return output

def train(self,some_params): 

   optim.zero_grad()

   output = self(train_data)
   likelihood = output.gather(1,train_labels[:, n].detach().long().unsqueeze(1))
   loss = -torch.log(likelihood)
   loss = torch.mean(loss)
   loss.backward()

(Intel Novel) #2

I think you should not have forward inside forward. This would be a recursion.


(biggie bigs) #3

self.out.forward(x) calls the method of a nn.Linear.I don’t think this affects the code


(Devansh Bisla) #4

If you wish to backward pass the model a second time use retain_graph=True. This preserves the intermediate outputs required to compute gradients.


(Intel Novel) #5

Sounds interesting. I am still novice. If out is a module it may have forward() in which case you may call that directly like out().


(biggie bigs) #6

I know that,but I am getting this error after the first run. And even if I would enter more than once in the loop(supposing I wouldn t have this initial error) things should be right too since I call backward only once in the for. Am I missing something?


#7

Could you post a small executable code snippet so that we can debug it?
It looks like the posted methods are part of a class implementation and I’m not sure how you are using them.


(biggie bigs) #8

It looks like this

class SudokuSolver(nn.Module):
    def __init__(self, in_size, hidden_size, out_size):
        super().__init__()
        
        self.in_size = in_size
        self.hidden_size = hidden_size
        self.out_size = out_size
        
        #self.hidden = nn.Linear(in_size, hidden_size)
        #for now I am using a single layer
        self.out = nn.Linear(hidden_size, out_size)
        
    def forward(self, x):
        output = torch.softmax(self.out.forward(x), dim=-1)
        return output
    
    def train(self, train_data, train_labels, 
                    verbose=100, epochs=100, lr=0.1, l2_weight=0,
                    validation_data=None, validation_labels=None):
        
        optim = torch.optim.SGD(self.parameters(), lr=lr, weight_decay=l2_weight)
        train_loss = []
        validation_loss = []
        loss_fun = nn.MSELoss()
        
        print('start')
        for e in range(epochs):            
            for n in range(self.in_size):
              optim.zero_grad()
              
              output = self(train_data)
              likelihood = output.gather(1,train_labels[:, n].detach().long().unsqueeze(1))
              loss = -torch.log(likelihood)
              loss = torch.mean(loss)
              #here it drops the error(e = 0) 
              loss.backward()

              optim.step()
              #complete the sudoku
              j = torch.arange(train_data.detach().shape[0])
              train_data[j, n] = train_labels[j, n].detach()

            train_loss.append(loss.detach().numpy())
            if verbose!=0 and e%verbose==0:
                print(loss.detach())

I am trying to solve a sudoku using only nn.Linears. My approach is to predict every digit of my sudoku. After one prediction I am adding the correct digit to my training data and then I am running it again till I fill the whole sudoku. It stops after the first loss.backward().


#9

If I just guess some input parameters and shapes, your code seems to work:

model = SudokuSolver(5, 5, 2)
train_data = torch.randn(10, 5)
train_labels = torch.randint(0, 2, (10, 10)).float()
model.train(train_data, train_labels, verbose=1)

Could you check what’s different and correct some parameters?


(biggie bigs) #10

I finally figured out. I was setting to true the requires_grad for my input tensor. I guess that being an empty grad caused this error,right?