Hi! I am urgently seeking advice as I am not sure if I computed a certain gradient correctly.

**Setting:**

In my toy model I have a neural net called `Phi_net`

(input data is 4d and output is 1d) and another neural net `D_net`

(input data is 2d and output is 4d). Both have only one linear layer. I implemented them in the following way with weights initialized manually:

```
class Phi_Net(nn.Module):
def __init__(self):
super(Phi_Net, self).__init__()
self.fc = nn.Linear(4, 1)
with torch.no_grad():
self.fc.weight.data = torch.tensor([[1.,2.,3.,4.]])
self.fc.bias.fill_(0.)
def forward(self, x):
return self.fc(x)
class D_Net(nn.Module):
def __init__(self):
super(D_Net, self).__init__()
self.fc = nn.Linear(2, 4)
with torch.no_grad():
self.fc.bias.fill_(0)
self.fc.weight.data = torch.tensor([[1.,0.],
[0,5.],
[1.,0.],
[1.,0.]])
def forward(self, z):
return self.fc(z)
```

I then want to do the following: I set z = [1.,5.] and compute the squared cosine similarity between the gradient of Phi_net w.r.t. its input evaluated at x=D(z), and the partial derivative of D_net w.r.t. z_1 evaluated at z = [1.,5.]. In the toy example that is all trivial but I want to eventually do this for complexer neural nets `Phi_net`

and `D_net`

.

I do not know how to integrate latex into this post but I took a screenshot of the quantity that I want to compute

where `Phi`

refers to the instance of the `Phi_net`

and `D`

to the instance of `D_net`

. **Not only do I want to compute this squared cosine similarity, I also want to compute its derivative w.r.t. to the model parameters in D_net.** I tried to do this in the following code:

```
Phi_net = Phi_Net()
D_net = D_Net()
z = torch.tensor([1.,.5], requires_grad=True)
x = D_net(z).requires_grad_(True)
x.retain_grad()
phi = Phi_net(x)
# ------Compute squared cosine similarity----
phi.backward(retain_graph=True, create_graph=True)
dphi_dx = x.grad.clone()
print("dphi_dx: ", dphi_dx.clone().detach())
print("norm dphi_dx: ", torch.square(dphi_dx.clone().detach().norm(p=2)))
x.backward(gradient=dphi_dx,retain_graph=True, create_graph=True )
dot_prods = z.grad.clone()
print("dot_prods: ", dot_prods.clone().detach())
z.grad.zero_()
norm_sq_dD_dz = torch.zeros_like(dot_prods)
for i in range(x.shape[-1]):
unit_vec = torch.zeros(x.shape[-1], requires_grad=False)
unit_vec[i] = 1.
x.backward(gradient=unit_vec,retain_graph=True, create_graph=True)
norm_sq_dD_dz.add_(torch.square(z.grad.clone()))
z.grad.zero_()
print("norm_sq_dD_dz: ",norm_sq_dD_dz.clone().detach())
cossim_sq = torch.square(dot_prods)/ (torch.square(dphi_dx.norm(p=2))*norm_sq_dD_dz)
print("cossim_sq: ", cossim_sq.clone().detach())
D_net.fc.weight.grad.data.zero_()
# ------Compute gradient of squared cossim w.r.t weights in D_net----
cossim_sq[0].backward()
print("D_net.fc.weight.grad.data:\n", D_net.fc.weight.grad.data)
```

The output was

```
dphi_dx: tensor([1., 2., 3., 4.])
norm dphi_dx: tensor(30.0000)
dot_prods: tensor([16., 20.])
norm_sq_dD_dz: tensor([ 3., 25.])
cossim_sq: tensor([2.8444, 0.5333])
D_net.fc.weight.grad.data:
tensor([[-1.1852, 0.0000],
[ 1.4222, 0.0000],
[ 0.2370, 0.0000],
[ 0.9481, 0.0000]])
```

I computed the derivatives by hand and everyoutput of the above matched except for `D_net.fc.weight.grad.data:`

, i.e. the gradient of the cossine similarity. Particularly I checked the value `-1.1852`

and it did not match my result.

- It looks like the squared cosine similarity was computed correctly
- But not the gradient of the squared cosine similarity w.r.t. the parameters of
`D_net`

- I may have miscalculated my derivatives by hand though I have checked many times and
`-1.1852`

did not match. I am not too familiar with autograd and hoped someone could look over the code to check where my mistake is (if there is indeed one) and or tell me how to compute the gradient that I want correctly and most efficiently. - Can I avoid doing my for loop
`for i in range(x.shape[-1]):`

? I donâ€™t thinks so but I am happy if someone has a suggestion.

Thank you so much for any help in advance!

Best

Stefan