Add one tensor from dataloader to optimiser

alwynmathew · February 5, 2020, 10:08am

I want to optimize my init_tensor instead of optimizing the model. My data loader train_dataloader gives the initial value of init_tensor.

model.eval()
for inputs in train_dataloader:
     init_tensor = inputs["init_tensor"]
     optimizer = optim.SGD(init_tensor, learning_rate)
     output = model(init_tensor)
     loss = loss_fn(output)
     model.zero_grad()
     loss.backward()
     optimizer.step()

Defining the optimizer inside the training loop doesn’t look right. Will this hinder optimizing the init_tensor?
How can I move the optimizer initialization outside the training loop but still add init_tensor to optimizer?

Any help would be appreciated.

JuanFMontesinos · February 5, 2020, 10:47am

What’s wrong with initilizing the optimizer inside training loop?
In the end if you want to optimize the input you will need to initilize an optimizer per input I would say.
Once you have optimized the input, old optimizer is no longer useful.
If you want to optimize both, model and init tensor, you may think about having 2 optimizers, one for the model and another one for each tensor.

alwynmathew · February 5, 2020, 11:13am

But when we initialize the optimizer as per above mentioned, optimizer could only take one step per input_tensor. Thus would you recommend taking multiple step for each input_tensor until a condition as shown below:

model.eval()
for inputs in train_dataloader:
     init_tensor = inputs["init_tensor"] # tensor to be optimized
     optimizer = optim.SGD(init_tensor, learning_rate)
     input_tensor = inputs["input_tensor"] # different images from loader
     input = init_tensor + input_tensor
     loss = 1
     while(loss > threshold):
          output = model(input)
          loss = loss_fn(output)
          model.zero_grad()
          loss.backward()
          optimizer.step()

JuanFMontesinos · February 5, 2020, 12:47pm

I mean, as far as I understand, if you are optimizing a tensor that means somehow that tensor is gonna “remain” during several iterations. It’s a bit diff to discuss about this not knowing what are you trying to do but yes, iterating over that tensor until you reach your target would be the expected pipeline.

Whether the adecuate criteria were loss or any other metric, I cannot know.

tux · February 5, 2020, 1:23pm

To me, what you are doing is a gradient flow. I would suggest you to have a look at the very good post on Wasserstein gradient flow here : https://www.kernel-operations.io/geomloss/_auto_examples/comparisons/plot_gradient_flows_2D.html#sphx-glr-auto-examples-comparisons-plot-gradient-flows-2d-py

In the gradient flow function, they basically do:

for i in range(Nsteps):
        # Compute loss and gradient
        L_αβ = loss(x, y)
        [g]  = torch.autograd.grad(L_αβ, [x])
        x.data -= learning_rate * len(x) * g

This way, you do not need to define an optimizer as you do it yourself ! Hope this helps

alwynmathew · February 5, 2020, 5:08pm

Thank you @JuanFMontesinos and @tux.

In my above code snippet, I have done model.zero_grad(), but I don’t think that is correct. As I’m optimizing init_tensor, I should zero grad the optimizer as below:

model.eval()
for inputs in train_dataloader:
     init_tensor = inputs["init_tensor"] # tensor to be optimized
     optimizer = optim.SGD(init_tensor, learning_rate)
     input_tensor = inputs["input_tensor"] # different images from loader
     input = init_tensor + input_tensor
     loss = 1
     while(loss > threshold):
          output = model(input)
          loss = loss_fn(output)
          optimizer.zero_grad()
          loss.backward()
          optimizer.step()

JuanFMontesinos · February 5, 2020, 6:15pm

In fact, you should zero grad both, model and tensor. Realize that you are doing:

     while(loss > threshold):
          output = model(input)
          loss = loss_fn(output)
          optimizer.zero_grad()
          loss.backward()
          optimizer.step()

That way gradients are being accumulated on model. As model isn’t wrapped by the optimizer, its gradients are never zeroed. (In pytorch if you don’t make gradients zero they accumulate to the previous ones)

alwynmathew · February 6, 2020, 4:50am

Thanks @JuanFMontesinos. Cleaning it up, the final pipeline will look like this:

model.eval()
for batch_id, inputs in enumerate(train_dataloader):
     init_tensor = inputs["init_tensor"] # tensor to be optimized
     optimizer = optim.SGD(init_tensor, learning_rate)
     input_tensor = inputs["input_tensor"] # different images from loader
     input = init_tensor + input_tensor
     loss = 1
     while(loss > threshold):
          output = model(input)
          loss = loss_fn(output)
          model.zero_grad()
          optimizer.zero_grad()
          loss.backward()
          optimizer.step()

I still have a doubt,

How can we use the learned info of optimizer from inputs of batch_id=1 to improve the optimization of inputs of batch_id=2? As we are initializing optimizer for each inputs we are throwing out all it has learned from inputs of batch_id=1. So that the optimizer should spend almost same amount of time optimizing inputs with batch_id=2 as it did with inputs of batch_id=1. How can we improve this?

alwynmathew · February 6, 2020, 9:30am

According to the link below, we don’t need to do model.zero_grad() as the model is in eval mode.

Pooria · January 28, 2023, 10:06pm

Did you find a solution/explanation for this?