Optimize input instead of network weights

Mughees · March 10, 2021, 8:20pm

Hi,

I am trying to optimize the inputs in some required task and I don’t want to update my network as its freezed. I have written a minimal example but its not working as z has the same value in all iterations. I am sure that I am doing some silly mistake in this process. Any guidance is highly appreciated.
thanks.

import torch

z = torch.rand((1,6))
z.requires_grad_(True)
optimizer = torch.optim.SGD([z], lr= 0.1)

criteria = torch.nn.MSELoss()


for i in range(10):
    optimizer.zero_grad()
    print(z)
    loss = criteria(z, z+torch.rand(1))
    #print(loss)
    loss.backward()
    optimizer.step()

##output
tensor([[0.1105, 0.8152, 0.2820, 0.1122, 0.6645, 0.7211]], requires_grad=True)
tensor([[0.1105, 0.8152, 0.2820, 0.1122, 0.6645, 0.7211]], requires_grad=True)
tensor([[0.1105, 0.8152, 0.2820, 0.1122, 0.6645, 0.7211]], requires_grad=True)
tensor([[0.1105, 0.8152, 0.2820, 0.1122, 0.6645, 0.7211]], requires_grad=True)
tensor([[0.1105, 0.8152, 0.2820, 0.1122, 0.6645, 0.7211]], requires_grad=True)
tensor([[0.1105, 0.8152, 0.2820, 0.1122, 0.6645, 0.7211]], requires_grad=True)
tensor([[0.1105, 0.8152, 0.2820, 0.1122, 0.6645, 0.7211]], requires_grad=True)
tensor([[0.1105, 0.8152, 0.2820, 0.1122, 0.6645, 0.7211]], requires_grad=True)
tensor([[0.1105, 0.8152, 0.2820, 0.1122, 0.6645, 0.7211]], requires_grad=True)
tensor([[0.1105, 0.8152, 0.2820, 0.1122, 0.6645, 0.7211]], requires_grad=True)

naveenvemy · March 10, 2021, 9:16pm

There is no learning happening here. You are adding a constant value to your z and using that as the target value while calculating MSE loss. In that case, the MSE will also be a constant and hence, the gradient of the loss function with respect to z will be 0. You can check the gradient values by printing them after each loop.

for i in range(10):
    optimizer.zero_grad()
    loss = criteria(z, z+torch.rand(1))
    #print(loss)
    loss.backward()
    optimizer.step()
    print(z.grad)

Output:

tensor([[0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0.]])