# Trainable scalar every n batches

Dear Forum,

Is it possible to create a scalar that is only trained during n-amount of batches? I would create my scalar alpha (init. at 1) through:

`alpha = nn.Parameter(torch.ones(1, requires_grad=True))`

Am I correct to assume that the value of alpha then changes during training, e.g. becomes 0.9? Or did I set this up incorrectly.

How can I avoid alpha being trained during `.step()`, and only train it every nth Batch? Intuitively I would `model.alpha.requires_grad = False` during other batches. Then, every nth Batch.

``````for param in model.parameters():
``````

Hi L!

There are a number of ways to do this. I would probably use two
separate optimizers:

``````optModel = torch.optim.SGD (myModel.parameters(), lr = 0.1)
optAlpha = torch.optim.SGD ([alpha], lr = 0.1)
``````

(This assumes that `alpha` is not one of `myModel`'s parameters.)

Now you can call `optModel.step()` for every batch and only call
`optAlpha` after every `n` batches.

Note, that every call to `loss.backward()` will accumulate lossâ€™s
gradient with respect to `alpha` into `alpha.grad`. If you donâ€™t want
this you might do something like:

``````optAlpha.zero_grad()
# calculate loss for just one batch ...
loss.backward()   # alpha.grad is now just from one batch
optAlpha.step()
``````

You can make such a scheme work, but note that if your optimizer is
using momentum or weight decay, having a parameterâ€™s `grad` be zero
will not prevent that parameterâ€™s value being changed by an `opt.step()`
update.

Best.

K. Frank

1 Like

Thank you for your detailed reply. I think this solves my problem. It didnâ€™t occur for me to think about the impact of weight decay. I am an AdamW fan, so I will have to consider this.