Optimization of inputs


I have a Softmax model, can I calculate the gradients with respect to the input vectors so that I optimize the input vectors and the total loss?

through these steps, the loss is calculated (cross entropy) and the weights and biases are updated

loss = self.criterion(logits, labels) + self.regularizer

How can I include input vectors in the optimisation process so that the model learns and updates: weights, biases, and input vectors?



Yes you certainly can.
You simply need to set requires_grad=True for the input vectors as such:

logits = model(inputs)
loss = self.criterion(logits, labels) + self.regularizer

This will compute the gradient of the loss w.r.t. the inputs when you call loss.backward() (which will be stored in inputs.grad). You also need to add the inputs to your optimizer, otherwise, optimizer.step() will not modify them.

Thank you so much for your reply, I already come to the same idea yesterday but I want to make sure of the following:

1- How can I add the input variable to the optimizer? I tried to use nn.parameter at initialization but clearly it seems that I can not initialize a variable that is not defined yet. Any help?

is it simply like this?

data = Variable(sample[0], requires_grad=True)
self.optimizer.add_param_group({“params”: data})

If yes, when I do: print (list(self.parameters())), I dont see the input among the parameters. Only weight and biases!

*** Also when i run the program with these 2 lines above it is taking very much long time. When adding the input of the model to the optimisation process, do i occupy the memory in any uncessary manner or is this slow running normal)?

Thanks for clarification

2- I would like to confirm: adding the input to the optimizer will: Make the total loss optimized considering the input data, AND will update the input data as well (so that input data as well well not be the same)?

Thank you sooo much for your help


  1. Add input tensor to the optimizer:
input.requires_grad = True
optimizer = optim.Adam([input], learning_rate)

Note that Variables don’t exist anymore in the newest versions on PyTorch.

In case, if you want to add input tensor along with model parameters to optimizer:

params_to_train = [input]
params_to_train += list(model.parameters())
optimizer = optim.Adam(params_to_train, learning_rate)
  1. Yes, adding the input to the optimizer will make input trainable. Make sure that grad of input tensor is turned on.

This may help you:

The difficulty in helping you here is that you are not giving a fully working code sample so I am not sure what you exactly wish to do.
I would also strongly suggest that you understand the way the optimizer are implemented in PyTorch.

In your case, if the input is not changing (not using a dalaloader for example as you would load new data at each iteration) ; you’d need to add the inputs to the optimizer when you are defining it:

model = MyModelClass() # initialize model
inputs = MyData() # load data

parameters = [
    {'params': model.parameters()},
    {'params': [inputs], 'lr': inputs_learning_rate}
optimizer = SGD(parameters, lr=learning_rate)

In this case, the learning rate for the parameters of the model pour be learning_rate while the learning rate for the inputs would be inputs_learning_rate.

The reason you are not seeing the inputs among the parameters when calling list(self.parameters()) is probably because you are in the model class which has parameters as attributes but your data is not an attribute of the class. If you want a list of the parameters that are being optimized by the optimizer, you need to call optimizer.param_groups.

Here is a part of my code

** the Optimizer is defined outside this block of code **

for epoch in range(num_epochs):

         print("epoch no. ", epoch + 1)    # training loop
         train_loss = 0
         for batch_ndx, sample in enumerate(TrainLoader):
             data = Variable(sample[0], requires_grad=True)
             self.data = nn.Parameter(data)
             self.optimizer.add_param_group({"params": data})
             labels = sample[1].type(torch.LongTensor)-1
             logits, probas = self.model(data.float())
             loss = self.criterion(logits, labels) + self.regularizer
             if epoch + 1 == num_epochs:
                 data_optimised = list(self.parameters())[0].clone()
             train_loss += loss.item()


The purpose of “New_data” is to return the optimized input data so that I can compare to the original data before optimization. There are 2 for loops; one over epochs and one over data batches, and I only append the optimised data at the last epochs of training. ***** Does this seem logical?? *****

What i observed is that the difference between input data and the optimised data is so small and it did not affect the result of logistic regression remarkably. Consequently, I want to make sure:

1- If my implementation is correct in the way of how I add data to the optimizer and optimise them?
2- If data optimization can not so much affect the results of regression, what is the utility of this step in general? I remarked some people trying to do the same but not a lot.

********** Another important question **************
In the code above it was easy to add (requires_grad=True) to the data tensor. This data is produced through cross correlating word vectors (i.e., word_vectors_1 * word_vector_2 = data) . This process of cross correlation is coded outside the training phase. Can I include word_vectors_1 & word_vector_2 in the optimization process as well? I wont be able to add them to the optimizer as they are not included in the training file, where is the optimizer is first defined.

So how can I add them to the optimizer while they are defined in other files in the program?

Thanks for help


I have a similar problem.
To get the gradient w.r.t input, I add the .requires_grad
but it doesn’t work.

for num,(sample_img, sample_label) in enumerate(mnist_test):
if num == 1:

sample_img = sample_img.to(device)
sample_img.requires_grad = True
prediction = model(sample_img.unsqueeze(dim=0))
cost = criterion(prediction, torch.tensor([sample_label]).to(device))

plt.imshow(sample_img.detach().cpu().squeeze(), cmap='gray')