Non LSTM: Trying to backward through the graph a second time, but the buffers have already been freed

sachinruk · October 25, 2019, 1:53am

Note that unlike other questions, this is not about any RNN structure. I wish to create a model that has changing gradients, and will look like below. The breakpoints are manually supplied.

The model that I have created is as follows:

class Trend(nn.Module):
    """
    Broken Trend model, with breakpoints as defined by user.
    """
    def __init__(self, breakpoints):
        super().__init__()
        self.bpoints = breakpoints[None, :]
        self.init_layer = nn.Linear(1,1) # first linear bit
        # extract gradient and bias
        w = self.init_layer.weight
        b = self.init_layer.bias
        self.params = [[w,b]] # save it to buffer
            
        if len(breakpoints>0):
            # create deltas which is how the gradient will change
            deltas = torch.randn(len(breakpoints)) / len(breakpoints) # initialisation
            self.deltas = nn.Parameter(deltas) # make it a parameter
            
            for d, x1 in zip(self.deltas, breakpoints):
                y1 = w *x1 + b # find the endpoint of line segment (x1, y1)
                w = w + d # add on the delta to gradient 
                b = y1 - w * x1 # find new bias of line segment 
                self.params.append([w,b]) # add to buffer

        # create buffer
        self.wb = torch.zeros(len(self.params), len(self.params[0]))
        
    def __copy2array(self):
        """
        Saves parameters into wb
        """
        for i in range(self.wb.shape[0]):
            for j in range(self.wb.shape[1]):
                self.wb[i,j] = self.params[i][j]
        
    def forward(self, x):
        # get the line segment area (x_sec) for each x
        x_sec = x >= self.bpoints
        x_sec = x_sec.sum(1)
        self.__copy2array() # copy across parameters into matrix
        
        # get final prediction y = mx +b for relevant section
        return x*self.wb[x_sec][:,:1] + self.wb[x_sec][:,1:]

However, once I attempt to train it I get the error RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

I obtained the above plot by doing:

time = torch.arange(700).float()[:,None]
y_pred = model(time)
plt.plot(time, y_pred.detach().numpy())
plt.show()

So we know the forward pass is working as expected. However the backward pass is not quite working. Was wondering what I need to change to get it working.

If you’re wondering why __copy2array is being used, when I tried to use torch.Tensor(self.params) it destroyed the gradients in those parameters. Thanks in advance.

albanD · October 25, 2019, 2:42pm

Hi,

So the problem comes from the fact that you perform some computations with your parameters in the __init__ method and then use these results at every forward. This means that the part of the computations you do in __init__ is shared for every forward and thus the error because you backward in the part of the graph multiple times.

I think the problem is that you perform computations in the __init__ that should be in the forward.
Indeed, the biases for each section will change when you learnt slope will change. So you cannot precompute them.