Unsupervised minimization with SGD

Hello everyone, I am trying to implement a minimization problem using SGD.
In particular, I have an objective function (or loss) that looks like this:

where q_theta is parametrized by a fully connected NN and has the form:

def objective(p, output):
x,y = p
a = minA
b = minB
r = 0.1

XA = 1/2 -1/2 * torch.tanh(100*((x - a[0])**2 + (y - a[1])**2 - (r + 0.02)**2))
XB = 1/2 -1/2 * torch.tanh(100*((x - b[0])**2 + (y - b[1])**2 - (r + 0.02)**2))
q = (1-XA)*((1-XB)* output - (XB))
return q

“output” is the output of the NN, namely the only part of this function that is parametrized.

Now, my training function looks like this:

optimizer = optim.SGD(model.parameters(), lr=learning_rate)

for e in range(epochs) :
for configuration in total:
# for each point in the array of independently sampled points

#output is q~
output = model(configuration)

#loss is the objective function we defined
#in the paper, objective function is 18

loss = objective(configuration, output).backward()
optimizer.step()

Where my model is a simple two-layer fully connected NN, with an input layer equal to 2 (x,y) and one output node corresponding to the parametrized part of the function.
Note that each “configuration” is a point in a 2D space, which is sampled independently from a distribution to perform the sample average, which approximates the expectation in (18).

However, the resulting minimized function does not make any sense. In particular, I am not sure I am handling the objective function correctly. Is .backward() substituting the gradient in (18) or should I compute the gradient with autograd?