What’s wrong with initilizing the optimizer inside training loop?
In the end if you want to optimize the input you will need to initilize an optimizer per input I would say.
Once you have optimized the input, old optimizer is no longer useful.
If you want to optimize both, model and init tensor, you may think about having 2 optimizers, one for the model and another one for each tensor.
But when we initialize the optimizer as per above mentioned, optimizer could only take one step per input_tensor. Thus would you recommend taking multiple step for each input_tensor until a condition as shown below:
model.eval()
for inputs in train_dataloader:
init_tensor = inputs["init_tensor"] # tensor to be optimized
optimizer = optim.SGD(init_tensor, learning_rate)
input_tensor = inputs["input_tensor"] # different images from loader
input = init_tensor + input_tensor
loss = 1
while(loss > threshold):
output = model(input)
loss = loss_fn(output)
model.zero_grad()
loss.backward()
optimizer.step()
I mean, as far as I understand, if you are optimizing a tensor that means somehow that tensor is gonna “remain” during several iterations. It’s a bit diff to discuss about this not knowing what are you trying to do but yes, iterating over that tensor until you reach your target would be the expected pipeline.
Whether the adecuate criteria were loss or any other metric, I cannot know.
In my above code snippet, I have done model.zero_grad(), but I don’t think that is correct. As I’m optimizing init_tensor, I should zero grad the optimizer as below:
model.eval()
for inputs in train_dataloader:
init_tensor = inputs["init_tensor"] # tensor to be optimized
optimizer = optim.SGD(init_tensor, learning_rate)
input_tensor = inputs["input_tensor"] # different images from loader
input = init_tensor + input_tensor
loss = 1
while(loss > threshold):
output = model(input)
loss = loss_fn(output)
optimizer.zero_grad()
loss.backward()
optimizer.step()
That way gradients are being accumulated on model. As model isn’t wrapped by the optimizer, its gradients are never zeroed. (In pytorch if you don’t make gradients zero they accumulate to the previous ones)
Thanks @JuanFMontesinos. Cleaning it up, the final pipeline will look like this:
model.eval()
for batch_id, inputs in enumerate(train_dataloader):
init_tensor = inputs["init_tensor"] # tensor to be optimized
optimizer = optim.SGD(init_tensor, learning_rate)
input_tensor = inputs["input_tensor"] # different images from loader
input = init_tensor + input_tensor
loss = 1
while(loss > threshold):
output = model(input)
loss = loss_fn(output)
model.zero_grad()
optimizer.zero_grad()
loss.backward()
optimizer.step()
I still have a doubt,
How can we use the learned info of optimizer from inputs of batch_id=1 to improve the optimization of inputs of batch_id=2? As we are initializing optimizer for each inputs we are throwing out all it has learned from inputs of batch_id=1. So that the optimizer should spend almost same amount of time optimizing inputs with batch_id=2 as it did with inputs of batch_id=1. How can we improve this?