Thank you for replying.
I am familiar with the chain rule, but I donāt know where exactly gradients of loss wr.t. parameters are calculated in code if I use autograd function as described in extending pytorch tutorial.
Can you please look at below code snippet in which I have commented my doubts, I think this would be a better way to clarify doubts 
class Custom_Convolution(torch.autograd.Function):
@staticmethod
def forward(ctx, input, weight, bias, stride, padding): # input's shape = ([batch_size=100, 96, 8, 8])
output = torch.nn.functional.conv2d(input, weight, bias, stride, padding)
ctx.save_for_backward(input, weight, bias, output)
return output #output's shape = ([[batch_size= 100,128, 4, 4])
@staticmethod
def backward(ctx, grad_output): #grad_output size = ([batch_size= 100, 128, 4, 4])
input, weight, bias, output = ctx.saved_tensors #input's size = ([batch_size=100 , 96,8,8])
print("op:",output.shape, output.requires_grad, output.grad_fn )
## It shows, output requires gradient and grad_fn = <torch.autograd.function.Custom_ConvolutionBackward object at ...>
## I am cloning the output because I think it will override the gradients
## of already existing output tensor which may affect further calculations
## of grad_input, grad_weight and grad_bias.
## PLEASE CORRECT ME IF I AM WRONG.
features = output.clone()
print("op2:",features.requires_grad, features.grad_fn )
## It prints: False , None !!!!
## HOW CAN I RETAIN PAST HISTORY OF output SO THAT IT STILL
## WOULD REQUIRE GRADIENT AND POSSESS A GRADIENT FUNCTION??
features = features.view(features.shape[0], features.shape[1], -1)
#Total_features= features.shape[0]* features.shape[1]
cont_loss = torch.tensor([0.]).requires_grad_(requires_grad=True).to(dev) # shape: ([1])
for ..... :
# My code for loss... includes some operations like torch.div,exp,sum...
# Calculation of loss for each feature 'i' : Li
# cont_loss += Li (Number of Li values = features.shape[0]* features.shape[1])
I want to backpropagate from cont_loss to features (i.e. output) and then features to weight tensor.
So, when I use torch.autograd.grad(outputs= cont_loss, inputs= weight , retain_graph=(True))
,
I am getting RuntimeErrors like
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior / One of the tensor used in computational graph either does not require gradient or it has No gradient function.