Trainable scalar every n batches

Dear Forum,

Is it possible to create a scalar that is only trained during n-amount of batches? I would create my scalar alpha (init. at 1) through:

alpha = nn.Parameter(torch.ones(1, requires_grad=True))

Am I correct to assume that the value of alpha then changes during training, e.g. becomes 0.9? Or did I set this up incorrectly.

How can I avoid alpha being trained during .step(), and only train it every nth Batch? Intuitively I would model.alpha.requires_grad = False during other batches. Then, every nth Batch.

for param in model.parameters():
    param.requires_grad = False
model.alpha.required_grad = True

Thanks in advance for your help.

Hi L!

There are a number of ways to do this. I would probably use two
separate optimizers:

optModel = torch.optim.SGD (myModel.parameters(), lr = 0.1)
optAlpha = torch.optim.SGD ([alpha], lr = 0.1)

(This assumes that alpha is not one of myModel’s parameters.)

Now you can call optModel.step() for every batch and only call
optAlpha after every n batches.

Note, that every call to loss.backward() will accumulate loss’s
gradient with respect to alpha into alpha.grad. If you don’t want
this you might do something like:

optAlpha.zero_grad()
# calculate loss for just one batch ...
loss.backward()   # alpha.grad is now just from one batch
optAlpha.step()

You can make such a scheme work, but note that if your optimizer is
using momentum or weight decay, having a parameter’s grad be zero
will not prevent that parameter’s value being changed by an opt.step()
update.

Best.

K. Frank

1 Like

Thank you for your detailed reply. I think this solves my problem. It didn’t occur for me to think about the impact of weight decay. I am an AdamW fan, so I will have to consider this.