Hi,

I am hitting my head against the wall for a while here… maybe some kind person can help me out?

In a simple example I am trying to compute the second order derivative and put a cost function on it. For example

The cost function is (d(dy/dx)/dx - 0)^2

In the example there is a 1x1 2d conv (filled with 1s and 0 bias) and a 1x1 linear layer (filled with 1s and 0 bias).

I expect to see [16] in both the linear layer and conv layer .grad , but I only see this in the linear layers .grad and the conv layers .grad is None, although as far as I can tell both have been used extremely similarly.

Does anyone know what I am doing wrong here or how to fix it? I would like the cost function applied to the first derivative to backprop and update the .grad of the convolutional layers weights so that I could run gradient descent on the weights to minimize the cost function above.

In a slightly separate question does anyone know where I can find more documentation on autograd.grad and autograd.backward? It seems like the current documentation online is not accurate (only_inputs is deprecated and obsolete), not sure if there is a better version of explanation of the graph vs autograd.grad vs autograd.backward somewhere that I am missing…

Thank you,

Misko

```
import torch
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv=torch.nn.Conv2d(1,1,1)
self.conv.weight.data.fill_(1)
self.conv.bias.data.fill_(0)
self.linear=torch.nn.Linear(1,1)
self.linear.weight.data.fill_(1)
self.linear.bias.data.fill_(0)
def forward(self,xo):
x=self.conv(xo.reshape(1,1,1,1)**2).sum().reshape(1,1) # required otherwise second deriviative is a scalar not depending on other variable
x+=self.linear(xo.reshape(1,1)**2)
l=torch.autograd.grad(outputs=x,inputs=xo,create_graph=True)[0]
y=(l**2).sum()
model.zero_grad()
y.backward()
model=Model()
x=Variable(torch.tensor([[1.0]]),requires_grad=True)
loss=model(x)
for name,parameter in model.named_parameters():
print("GRAD",name,parameter.size(),parameter.grad)
```