Customized linear layer gives issue during the backward computation

Hi everyone,
I need to implement a customized version of a normalization layer.
The idea is the following:

  • after initialization we forward the dataset (or a single batch) through the network; similarly to batch norm, every time the module is called inside the model we compute the mean over the batch (first foundamental difference w.r.t. BatchNorm: we do not need the std in this case) and we save a copy of these mean tensors
  • we use, from that moment on, the computed mean tensors to shift the forward propagation tensor, i.e. we subtract to the forwarding tensors the mean tensors computed after initialization, in the previous step(second foundamental difference w.r.t. BatchNorm: we use always the same constant, computed with the same forward, to offset the tensor instead of compute it with the statistics of the batch selected at each time)
class CustomInitBatchNorm2d(nn.Module):
    def __init__(self, layer_id):
        super().__init__()
        self.layer_id = layer_id
        self.stored_means = {}  # Dictionary to store means for each layer
    
    def forward(self, x):
        if self.training and self.layer_id not in self.stored_means:
            self.stored_means[self.layer_id] = x.mean(dim=0, keepdim=True)
        
        if self.layer_id in self.stored_means:
            return x - self.stored_means[self.layer_id]
        else:
            return x

Using the above module inside my model I get an error during the backward pass; I don’t understand why as the function I am defining in my forward pass is a simple linear operation (just subtracting a constant tensor).

I tried to modify the class adding a backward method as follows:

class CustomInitBatchNorm2d(nn.Module):
    def __init__(self, layer_id):
        super().__init__()
        self.layer_id = layer_id
        self.stored_means = {}  # Dictionary to store means for each layer
    
    def forward(self, x):
        if self.training and self.layer_id not in self.stored_means:
            self.stored_means[self.layer_id] = x.mean(dim=0, keepdim=True)
        
        if self.layer_id in self.stored_means:
            return x - self.stored_means[self.layer_id]
        else:
            return x

    def backward(self, grad_output):
        return grad_output  # Propagate gradients unchanged

but is not working as well.
More precisely, the error I get is:

  File ".../.local/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File ".../.local/lib/python3.9/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

I am not an expert of pytorch, do you have any idea of what is going on and how to fix it?

Could you explain what exactly is failing and post the error message with the stacktrace here, please?

1 Like

Hi @ptrblck , sure!
It seems that during the backward() step something goes wrong during the gradient computation; I reported the error I get at the end of the question.

Thanks for the update!
I guess this line:

self.stored_means[self.layer_id] = x.mean(dim=0, keepdim=True)

is storing the gradient history in self.stored_mean so you might need to .detach() the x.mean() tensor before assigning it.

1 Like

@ptrblck thank you for your prompt reply! it was indeed there the problem