Is it possible to create a scalar that is only trained during n-amount of batches? I would create my scalar alpha (init. at 1) through:
alpha = nn.Parameter(torch.ones(1, requires_grad=True))
Am I correct to assume that the value of alpha then changes during training, e.g. becomes 0.9? Or did I set this up incorrectly.
How can I avoid alpha being trained during
.step(), and only train it every nth Batch? Intuitively I would
model.alpha.requires_grad = False during other batches. Then, every nth Batch.
for param in model.parameters():
param.requires_grad = False
model.alpha.required_grad = True
Thanks in advance for your help.
There are a number of ways to do this. I would probably use two
optModel = torch.optim.SGD (myModel.parameters(), lr = 0.1)
optAlpha = torch.optim.SGD ([alpha], lr = 0.1)
(This assumes that
alpha is not one of
Now you can call
optModel.step() for every batch and only call
optAlpha after every
Note, that every call to
loss.backward() will accumulate loss’s
gradient with respect to
alpha.grad. If you don’t want
this you might do something like:
# calculate loss for just one batch ...
loss.backward() # alpha.grad is now just from one batch
You can make such a scheme work, but note that if your optimizer is
using momentum or weight decay, having a parameter’s
grad be zero
will not prevent that parameter’s value being changed by an
Thank you for your detailed reply. I think this solves my problem. It didn’t occur for me to think about the impact of weight decay. I am an AdamW fan, so I will have to consider this.