# Different forward and backward weights

I have a use case, where I need to use a different set of weights to compute the backward pass. An instance of where this is used is in this work https://www.nature.com/articles/ncomms13276 and numerous follow up works. Or this https://www.siarez.com/projects/random-backpropogation

Is there anyway to do this with the current API? If not what would be the best angle of attack?

Thanks

You could try to load the backward `state_dict` before executing the `backward` operation, but it’s quite a hacky way:

``````model = nn.Sequential(
nn.Linear(1, 1, bias=False),
nn.Linear(1, 1, bias=False)
)

model.weight.fill_(1.)
model.weight.fill_(1.)

sd_forward = copy.deepcopy(model.state_dict())
sd_backward = copy.deepcopy(sd_forward)
sd_backward['0.weight'].fill_(10.)
sd_backward['1.weight'].fill_(10.)

# one train step
output = model(torch.ones(1, 1))
output.mean().backward()
> tensor([[10.]])
> tensor([[1.]])
``````

Also note, that the last gradient is wrong, since the output was calculated using the old weights.
Would this approach work for you or did I misunderstand your question?

@ptrblck Thanks for your reply. I actually figured out the right way to do this. I basically wrote my own Autograd `Function` class, similar to here: https://pytorch.org/docs/stable/notes/extending.html
Then inside its `backward` method, I used a different set of weights to compute `grad_input`. So the backward now looks like this:

``````    def backward(ctx, grad_output):
input, weight, b_weights, bias = ctx.saved_tensors
if bias is not None and ctx.needs_input_grad:

``````

`b_weights` are the backwards weights that are passed to the forward function and saved in `ctx`

1 Like

That looks like a good approach! Thanks for sharing it! @ptrblck, I have a doubt, might be silly to ask🤐.

The `weight` and `bias` gradients would be the gradients of the loss w.r.t these parameters.
The input and output gradients are calculated through the chain rule to forward the gradient to the next layer (previous layer during the forward pass).

I am getting confused because when I checked the documentations behind

``````grad_input = torch.nn.grad.conv2d_input(input.shape, weight, grad_output)
``````

I came to know the `input` is the input to the (convolution) layer and [this] (https://github.com/pytorch/pytorch/blob/master/torch/nn/grad.py) code file also says that `conv2d_weight` function (line 170) computes the gradient of output of convolution with respect to the weight of the convolution.

So, where does the gradient of loss w.r.t. parameters is calculated if you take for example `class LinearFunction(Function): in extending pytorch tutorial`?

I’m not sure if I understand the question properly, but you would apply the chain rule and thus the conv output would be used.
The general workflow of the chain rule and backpropagation is explained e.g. in CS231n - Optimization.

I am familiar with the chain rule, but I don’t know where exactly gradients of loss wr.t. parameters are calculated in code if I use autograd function as described in extending pytorch tutorial.

Can you please look at below code snippet in which I have commented my doubts, I think this would be a better way to clarify doubts ``````class Custom_Convolution(torch.autograd.Function):

@staticmethod
def forward(ctx, input, weight, bias, stride, padding):  # input's shape = ([batch_size=100, 96, 8, 8])
output = torch.nn.functional.conv2d(input, weight, bias, stride, padding)
ctx.save_for_backward(input, weight, bias, output)
return output    #output's shape = ([[batch_size= 100,128, 4, 4])

@staticmethod
input, weight, bias, output = ctx.saved_tensors      #input's size = ([batch_size=100 , 96,8,8])

## I am cloning the output because I think it will override the gradients
## of already existing output tensor which may affect further calculations
##  PLEASE CORRECT ME IF I AM WRONG.

features = output.clone()
## It prints: False , None  !!!!
## HOW CAN I RETAIN PAST HISTORY OF output SO THAT IT STILL

features = features.view(features.shape, features.shape, -1)

#Total_features=  features.shape* features.shape

for ..... :
# My code for loss...   includes some operations like torch.div,exp,sum...
# Calculation of loss for each feature 'i' : Li
# cont_loss  +=  Li (Number of Li values = features.shape* features.shape)

``````

I want to backpropagate from cont_loss to features (i.e. output) and then features to weight tensor.
So, when I use ` torch.autograd.grad(outputs= cont_loss, inputs= weight , retain_graph=(True))`,

I am getting RuntimeErrors like
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior / One of the tensor used in computational graph either does not require gradient or it has No gradient function.

Based on your comments it seems you would like to apply something like a second order gradients, since you want to create `grad_fn`s inside the `backward`. If that’s the case, enable the gradient calculation via `with torch.enable_grad():`.

Hi, I modified my code but this is happening.