I have trained a NN with a vector input and scalar output (regression).
Now I want to find the global minimun of the NN using GD with pytorch.
I’m new to programming in general, python specifically, and pytorch even more specifically.
I believe what I’m trying to do must have been done a thousand times before, if not ten thousand times. I’ll be super happy and grateful if anyone could point me to some code somewhere (maybe in github) where there’s an example of what I’m trying to do that I could adjust to my needs.
Now I want to find the global minimun of the NN using GD with pytorch.
I believe what I’m trying to do must have been done a thousand times before, if not ten thousand times.
If you want to do GD specifically, not SGD, then each batch would basically be your entire training set (if you can fit it into memory). Otherwise, just accumulate the error before you do the manual update
thank you very much !
I have changed the code you sent me so as to fit my problem.
my problem is not to train the net parameters, as in the code you sent, but rather to “train” the input.
in other words, after having trained my net, I want to find the input that will give maximum output.
the chages I did are as follows:
I added ‘requires grad’ to the input
I changed the loss to be ‘-output’ (the output with minus sign) because I’m looking the maximum output
Instead of updating the parameters (w1, w2 in the example) I’m updating the input (x in the example)
here is the code with my changes:
I workes well.
however, what I want to do now is to load my pre-trained net and then instead of:
y_pred = x.mm(w1).clamp(min=0).mm(w2)
I will have:
y_pred = net(x)
however, this does not work for some reason. meaning, even after the ‘.backward()’ command the ‘.grad’ of ‘x’ remains ‘None’. note this does not happen if I do:
y_pred = x.mm(w1).clamp(min=0).mm(w2)
!!!
very strange…
I don’t understand why with these ‘.mm’ commands the ‘.grad’ of inputs is being calculated and saved and with ‘net()’ it isn’t happening
here is what my net looks like (it has 3 hidden layers, input layer size is 11, output layer size is 1, all 3 middle layers have a size of 100):
class non_linear_Net_3_hidden(nn.Module):
def init(self):
super(non_linear_Net_3_hidden, self).init()
self.fc1 = nn.Linear(input_layer_size, middle_layer_size)
self.fc2 = nn.Linear(middle_layer_size, middle_layer_size)
self.fc4 = nn.Linear(middle_layer_size, middle_layer_size)
self.fc3 = nn.Linear(middle_layer_size, final_layer_size)
def forward(self, x):
x = F.sigmoid(self.fc1(x))
x = F.sigmoid(self.fc2(x))
x = F.sigmoid(self.fc4(x))
x = self.fc3(x)
return x
I see, so in this case you can take the same approach but go into the opp direction of the gradient of the input features rather than the weights. I think that’s what you are doing now.
I will have:
y_pred = net(x)
however, this does not work for some reason. meaning, even after the ‘.backward()’ command the ‘.grad’ of ‘x’
Don’t know why it wouldn’t work, but you sure do have a lot of x’s there
x = F.sigmoid(self.fc1(x))
x = F.sigmoid(self.fc2(x))
x = F.sigmoid(self.fc4(x))
x = self.fc3(x)
return x
To disentangle that a bit, I would make sure that you keep your input “x” distinct from the rest and see if that solves the problem
thanks!
by ‘disentangle’, do you mean something like this:
def forward(self, x):
y = F.sigmoid(self.fc1(x))
y = F.sigmoid(self.fc2(y))
y = F.sigmoid(self.fc4(y))
y = self.fc3(y)
return y
have tried it. didn’t help.
here is something I’ve discovered which may shed some light on this problem:
the output also remains with ‘grad=None’, even though it also has ‘requires grad = true’.
…
edit:
I was given some more good advice and decided to do it like this:
grad = torch.autograd.grad(loss, input)
grad = torch.tensor(grad[0])
input -= learning_rate * grad
which works, only now I get another problem:
my input need to go through a softmax function (needs to be all positive and with a sum of 1)
and for some reason it always becomes a one-hot-vector. (one element equals 1 and all the others 0)
which makes the softmax function output ‘Nan’.
every time it’s a different number which becomes 1 with all the others being zero.