Fit gradient of the network using a function of inputs

Hi all,
I want to train a network which first of all instead of the inputs itself I want to pass a function of inputs to the network and also I want to fit the gradient of the network. I have written a code for that but the code doesn’t work. Here is a simple code for what I want to do

import torch
import torch.nn as nn


x_data = torch.rand((20, 3))
y_data = torch.rand(20)
f_data = torch.rand((20,3))

def dist1(x, data):
    return (torch.norm(data - x, dim=-1))

net = torch.nn.Sequential(
        torch.nn.Linear(20,25),
        torch.nn.Sigmoid(),
        torch.nn.Linear(25,15),
        torch.nn.Sigmoid(),
        torch.nn.Linear(15,1))

optimizer = torch.optim.Adam(net.parameters(), lr= 0.1)

for i in range(10):
    n=torch.randint(0,20, (1,))
    xi = x_data[n]
    yi = y_data[n]
    xi.requires_grad_()
    env = dist1(xi, x_data)
    
    
    energy = net(env)
    energy.backward(create_graph=True)
    force = xi.grad
    
    loss = torch.mean((force - f_data[n])**2)
    loss.backward()
    optimizer.step()
    
    print('=============================')
    for name, param in net.named_parameters():
        print('name: ', name)
        print('param: ', param)
    print('=============================')

Hi,

I think you’re missing a optimizer.zero_grad() before the last .backward() that computes your gradient step.
Also a slight optimization to avoid computing all the gradients in the network in the when computing the force: force, = autograd.grad(energy, xi, create_graph=True).

I tried both of your suggestions but still I have “nan” values for my parameters.

You can add torch.autograd.set_detect_anomaly(True) at the beginning of your script to know where nan appear in the backward pass.
In your particular case, the problem is that in your dist1 function, you take the second derivative of the norm at 0. Since this derivative includes a division by the norm value, you get nans.
You will need to make sure that data - x != 0 before taking the norm.