Find global minimun with GD

I have trained a NN with a vector input and scalar output (regression).
Now I want to find the global minimun of the NN using GD with pytorch.

I’m new to programming in general, python specifically, and pytorch even more specifically.

I believe what I’m trying to do must have been done a thousand times before, if not ten thousand times. I’ll be super happy and grateful if anyone could point me to some code somewhere (maybe in github) where there’s an example of what I’m trying to do that I could adjust to my needs.

meaning:
I want to know what inputs will give the minimum (maximum) output

Now I want to find the global minimun of the NN using GD with pytorch.
I believe what I’m trying to do must have been done a thousand times before, if not ten thousand times.

Have a look at this example here:

https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-tensors-and-autograd

It is updating the parameters manually in the step

with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

to minimize the loss function, but you can also use the SGD optimizer, which is explained here: https://pytorch.org/docs/stable/optim.html#module-torch.optim

If you want to do GD specifically, not SGD, then each batch would basically be your entire training set (if you can fit it into memory). Otherwise, just accumulate the error before you do the manual update

1 Like

thank you very much !
I have changed the code you sent me so as to fit my problem.
my problem is not to train the net parameters, as in the code you sent, but rather to “train” the input.
in other words, after having trained my net, I want to find the input that will give maximum output.

the chages I did are as follows:

  1. I added ‘requires grad’ to the input
  2. I changed the loss to be ‘-output’ (the output with minus sign) because I’m looking the maximum output
  3. Instead of updating the parameters (w1, w2 in the example) I’m updating the input (x in the example)
    here is the code with my changes:

I workes well.

however, what I want to do now is to load my pre-trained net and then instead of:
y_pred = x.mm(w1).clamp(min=0).mm(w2)

I will have:
y_pred = net(x)

however, this does not work for some reason. meaning, even after the ‘.backward()’ command the ‘.grad’ of ‘x’ remains ‘None’. note this does not happen if I do:
y_pred = x.mm(w1).clamp(min=0).mm(w2)
!!!

very strange…

I don’t understand why with these ‘.mm’ commands the ‘.grad’ of inputs is being calculated and saved and with ‘net()’ it isn’t happening

here is what my net looks like (it has 3 hidden layers, input layer size is 11, output layer size is 1, all 3 middle layers have a size of 100):
class non_linear_Net_3_hidden(nn.Module):
def init(self):
super(non_linear_Net_3_hidden, self).init()
self.fc1 = nn.Linear(input_layer_size, middle_layer_size)
self.fc2 = nn.Linear(middle_layer_size, middle_layer_size)
self.fc4 = nn.Linear(middle_layer_size, middle_layer_size)
self.fc3 = nn.Linear(middle_layer_size, final_layer_size)
def forward(self, x):
x = F.sigmoid(self.fc1(x))
x = F.sigmoid(self.fc2(x))
x = F.sigmoid(self.fc4(x))
x = self.fc3(x)
return x

I see, so in this case you can take the same approach but go into the opp direction of the gradient of the input features rather than the weights. I think that’s what you are doing now.

I will have:
y_pred = net(x)

however, this does not work for some reason. meaning, even after the ‘.backward()’ command the ‘.grad’ of ‘x’

Don’t know why it wouldn’t work, but you sure do have a lot of x’s there :wink:

x = F.sigmoid(self.fc1(x))
x = F.sigmoid(self.fc2(x))
x = F.sigmoid(self.fc4(x))
x = self.fc3(x)
return x

To disentangle that a bit, I would make sure that you keep your input “x” distinct from the rest and see if that solves the problem

thanks!
by ‘disentangle’, do you mean something like this:

def forward(self, x):
y = F.sigmoid(self.fc1(x))
y = F.sigmoid(self.fc2(y))
y = F.sigmoid(self.fc4(y))
y = self.fc3(y)
return y

have tried it. didn’t help.

here is something I’ve discovered which may shed some light on this problem:
the output also remains with ‘grad=None’, even though it also has ‘requires grad = true’.

edit:
I was given some more good advice and decided to do it like this:
grad = torch.autograd.grad(loss, input)
grad = torch.tensor(grad[0])
input -= learning_rate * grad

which works, only now I get another problem:
my input need to go through a softmax function (needs to be all positive and with a sum of 1)
and for some reason it always becomes a one-hot-vector. (one element equals 1 and all the others 0)
which makes the softmax function output ‘Nan’.
every time it’s a different number which becomes 1 with all the others being zero.

Ok. :smile: thank you very much for your help. Everything works now. Have a great autumn.

It seems you had solved the problem. Could you show the solution in a brief way?