Am I correct to assume that the value of alpha then changes during training, e.g. becomes 0.9? Or did I set this up incorrectly.
How can I avoid alpha being trained during .step(), and only train it every nth Batch? Intuitively I would model.alpha.requires_grad = False during other batches. Then, every nth Batch.
for param in model.parameters():
param.requires_grad = False
model.alpha.required_grad = True
There are a number of ways to do this. I would probably use two
separate optimizers:
optModel = torch.optim.SGD (myModel.parameters(), lr = 0.1)
optAlpha = torch.optim.SGD ([alpha], lr = 0.1)
(This assumes that alpha is not one of myModel’s parameters.)
Now you can call optModel.step() for every batch and only call optAlpha after every n batches.
Note, that every call to loss.backward() will accumulate loss’s
gradient with respect to alpha into alpha.grad. If you don’t want
this you might do something like:
optAlpha.zero_grad()
# calculate loss for just one batch ...
loss.backward() # alpha.grad is now just from one batch
optAlpha.step()
You can make such a scheme work, but note that if your optimizer is
using momentum or weight decay, having a parameter’s grad be zero
will not prevent that parameter’s value being changed by an opt.step()
update.
Thank you for your detailed reply. I think this solves my problem. It didn’t occur for me to think about the impact of weight decay. I am an AdamW fan, so I will have to consider this.