Hello everyone, I am trying to implement a minimization problem using SGD.
In particular, I have an objective function (or loss) that looks like this:
where q_theta is parametrized by a fully connected NN and has the form:
def objective(p, output): x,y = p a = minA b = minB r = 0.1 XA = 1/2 -1/2 * torch.tanh(100*((x - a)**2 + (y - a)**2 - (r + 0.02)**2)) XB = 1/2 -1/2 * torch.tanh(100*((x - b)**2 + (y - b)**2 - (r + 0.02)**2)) q = (1-XA)*((1-XB)* output - (XB)) return q
“output” is the output of the NN, namely the only part of this function that is parametrized.
Now, my training function looks like this:
optimizer = optim.SGD(model.parameters(), lr=learning_rate) for e in range(epochs) : for configuration in total: # for each point in the array of independently sampled points optimizer.zero_grad() #output is q~ output = model(configuration) #loss is the objective function we defined #in the paper, objective function is 18 loss = objective(configuration, output).backward() optimizer.step()
Where my model is a simple two-layer fully connected NN, with an input layer equal to 2 (x,y) and one output node corresponding to the parametrized part of the function.
Note that each “configuration” is a point in a 2D space, which is sampled independently from a distribution to perform the sample average, which approximates the expectation in (18).
However, the resulting minimized function does not make any sense. In particular, I am not sure I am handling the objective function correctly. Is .backward() substituting the gradient in (18) or should I compute the gradient with autograd?