Hi,

I’ve noticed something peculiar when using PyTorch backward hook functions. Perhaps this was designed so on purpose, but it makes using backward hooks a bit unintuitive.

First off, a backward hook function would look something like this:

```
def hook_fn(module, inputs, outputs):
# something here
```

where the function’s parameters are the module name, inputs (tuple), and outputs (tuple).

**This is the part that confuses me.** The parameter `inputs`

is a tuple, which contains gradients of the module, but the order of elements seems a bit… random. Not stochastic, but random. To elaborate, if we register this function on a `nn.Linear`

layer with bias parameters, this tuple becomes:

```
(grad_bias, grad_preactivation, grad_weight)
```

So why is the `grad_preactivation`

(gradients for activation of the previous layer) between `grad_bias`

and `grad_weight`

?

What makes it worse is that for convolutional layers, this order is shuffled. For `nn.Conv2d`

layers, the order seems to be:

```
(grad_preactivation, grad_weight, grad_bias)
```

Is there any documentation that explains this in depth (I can’t seem to find much ), and are there any better ways to intercept gradients in the backward pass? For example, say I want to the modify the gradients of the weights in each layer, where some layers are `nn.Conv2d`

and others are `nn.Linear`

, and the `bias`

could be either true of false. Then, would I need to hard-code my hook function to account for all possibilities, or is there a more elegant solution?

Thanks in advance