Loss function applied to first order derivative of conv network

Hi,
I am hitting my head against the wall for a while here… maybe some kind person can help me out?

In a simple example I am trying to compute the second order derivative and put a cost function on it. For example
The cost function is (d(dy/dx)/dx - 0)^2

In the example there is a 1x1 2d conv (filled with 1s and 0 bias) and a 1x1 linear layer (filled with 1s and 0 bias).

I expect to see [16] in both the linear layer and conv layer .grad , but I only see this in the linear layers .grad and the conv layers .grad is None, although as far as I can tell both have been used extremely similarly.

Does anyone know what I am doing wrong here or how to fix it? I would like the cost function applied to the first derivative to backprop and update the .grad of the convolutional layers weights so that I could run gradient descent on the weights to minimize the cost function above.

In a slightly separate question does anyone know where I can find more documentation on autograd.grad and autograd.backward? It seems like the current documentation online is not accurate (only_inputs is deprecated and obsolete), not sure if there is a better version of explanation of the graph vs autograd.grad vs autograd.backward somewhere that I am missing…

Thank you,

Misko

import torch

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv=torch.nn.Conv2d(1,1,1)
        self.conv.weight.data.fill_(1)
        self.conv.bias.data.fill_(0)
        self.linear=torch.nn.Linear(1,1)
        self.linear.weight.data.fill_(1)
        self.linear.bias.data.fill_(0)

    def forward(self,xo):
        x=self.conv(xo.reshape(1,1,1,1)**2).sum().reshape(1,1) # required otherwise second deriviative is a scalar not depending on other variable
        x+=self.linear(xo.reshape(1,1)**2)
        l=torch.autograd.grad(outputs=x,inputs=xo,create_graph=True)[0]
        y=(l**2).sum()
        model.zero_grad()
        y.backward()

model=Model()
x=Variable(torch.tensor([[1.0]]),requires_grad=True)
loss=model(x)
for name,parameter in model.named_parameters():
    print("GRAD",name,parameter.size(),parameter.grad)
1 Like

I am still having trouble with this :frowning: I think I have an easier example… If you use just one linear(1,1) module everything works as expected, but if you use one conv2d on a 1,1 input (equivalent to the linear module) then you get no gradient on the weights in the convolution…

import torch                                                          
from torch import nn                                                  
from torchviz import make_dot, make_dot_from_trace                    
                                                                      
model = nn.Sequential()                                               

#if its only one conv module in the model
#model.add_module('C0', nn.Conv2d(1,1,1))  #either uncomment this line                   

#if its just one linear module in the model
model.add_module('W0', nn.Linear(1,1))     #or uncomment this line
                                                                      
x = torch.randn(1,1,1,1).requires_grad_(True)                         
                                                                      
def double_backprop(inputs, net):                                     
    y = net(x).mean()**2                              
    grad,  = torch.autograd.grad(y, x, create_graph=True, retain_graph=True)   
    return grad.mean()                                                
    
dot=make_dot(double_backprop(x, model), params=dict(list(model.named_parameters()) + [('x', x)]))

dot.render(view=True) 

I think this was flagged as a bug, and a fix is in master. See https://github.com/pytorch/pytorch/issues/15353

It’ll be part of the 1.0.1 release this week.

1 Like

Thank you very much!! And thank you for all the great work! Sorry to be persistent :sweat_smile: , it’s just been driving me a little crazy :joy: