I have a simple 2-layer MLP and I’m trying to optimise some hyperparams that govern the network weights but they seem to remain unchanged during training.

E.g.

model = MLP(fan_in=2, hidden=50, fan_out=3)
# hyper-params to optimise
a = torch.nn.Parameter(torch.tensor([1.3]))
b = torch.nn.Parameter(torch.tensor([2.4]))
opt = torch.optim.SGD([a, b], lr, momentum, weight_decay)
for epochs
for minibatch
for p in model.parameters():
p.data = p.data * a + b
logits = model(X)
loss = cross_entropy(logits, y)
opt.zero_grad()
loss.backward()
opt.step()

In every iteration I print a and b and they seem to still contain their initial values, i.e. (1.3, 2.4) equivalently.

That is just wrong typo in the space formatting from copy pasting. I’ll fix it in the original question but still the point remains that the params (a, b) defined through nn.Parameter are not changing. Why?

It seems that the values of params a, b do change if they are incorporated into the input to the loss since loss.backward() computes gradients and changes the values of a, b accordingly.
E.g.:

logits = model(X)
logits += a * b
loss = cross_entropy(logits, y)
loss.backward()

But what if we want to optimise over params which implicitly affect the loss and are not a direct input to the loss function?