Different forward and backward weights

Thank you for replying.

I am familiar with the chain rule, but I don’t know where exactly gradients of loss wr.t. parameters are calculated in code if I use autograd function as described in extending pytorch tutorial.

Can you please look at below code snippet in which I have commented my doubts, I think this would be a better way to clarify doubts :sweat_smile:

class Custom_Convolution(torch.autograd.Function):    
    
    @staticmethod
    def forward(ctx, input, weight, bias, stride, padding):  # input's shape = ([batch_size=100, 96, 8, 8])
        output = torch.nn.functional.conv2d(input, weight, bias, stride, padding)  
        ctx.save_for_backward(input, weight, bias, output)
        return output    #output's shape = ([[batch_size= 100,128, 4, 4])
    
    @staticmethod
    def backward(ctx, grad_output):  #grad_output size = ([batch_size= 100, 128, 4, 4])    
        input, weight, bias, output = ctx.saved_tensors      #input's size = ([batch_size=100 , 96,8,8])
        print("op:",output.shape, output.requires_grad, output.grad_fn )
## It shows, output requires gradient and grad_fn = <torch.autograd.function.Custom_ConvolutionBackward object at ...>

## I am cloning the output because I think it will override the gradients 
## of already existing output tensor which may affect further calculations 
## of grad_input, grad_weight and grad_bias. 
##  PLEASE CORRECT ME IF I AM WRONG.

        features = output.clone()
        print("op2:",features.requires_grad, features.grad_fn )
## It prints: False , None  !!!!
## HOW CAN I RETAIN PAST HISTORY OF output SO THAT IT STILL 
## WOULD REQUIRE GRADIENT AND POSSESS A GRADIENT FUNCTION??

        features = features.view(features.shape[0], features.shape[1], -1)
        
        #Total_features=  features.shape[0]* features.shape[1]
                 
        cont_loss = torch.tensor([0.]).requires_grad_(requires_grad=True).to(dev)  # shape: ([1])
        for ..... :
                # My code for loss...   includes some operations like torch.div,exp,sum...
                # Calculation of loss for each feature 'i' : Li
                # cont_loss  +=  Li (Number of Li values = features.shape[0]* features.shape[1])

I want to backpropagate from cont_loss to features (i.e. output) and then features to weight tensor.
So, when I use torch.autograd.grad(outputs= cont_loss, inputs= weight , retain_graph=(True)),

I am getting RuntimeErrors like
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior / One of the tensor used in computational graph either does not require gradient or it has No gradient function.