Why nn.Parameter variables seem unchanged during training?

kirk86 · January 25, 2021, 2:34pm

I have a simple 2-layer MLP and I’m trying to optimise some hyperparams that govern the network weights but they seem to remain unchanged during training.

E.g.

model = MLP(fan_in=2, hidden=50, fan_out=3)
# hyper-params to optimise
a = torch.nn.Parameter(torch.tensor([1.3]))
b = torch.nn.Parameter(torch.tensor([2.4]))
opt = torch.optim.SGD([a, b], lr, momentum, weight_decay)

for epochs
  for minibatch
      for p in model.parameters():
          p.data = p.data * a + b
      logits = model(X)
      loss = cross_entropy(logits, y)
   
      opt.zero_grad()
      loss.backward()
      opt.step()

In every iteration I print a and b and they seem to still contain their initial values, i.e. (1.3, 2.4) equivalently.

Any idea what I’m missing?

ChickenTarm · January 25, 2021, 5:48pm

why do you only perform 1 optimizer step every epoch and not every minibatch?

kirk86 · January 25, 2021, 7:35pm

That is just wrong typo in the space formatting from copy pasting. I’ll fix it in the original question but still the point remains that the params (a, b) defined through nn.Parameter are not changing. Why?

kirk86 · January 25, 2021, 7:51pm

It seems that the values of params a, b do change if they are incorporated into the input to the loss since loss.backward() computes gradients and changes the values of a, b accordingly.
E.g.:

logits = model(X)
logits += a * b
loss = cross_entropy(logits, y)
loss.backward()

But what if we want to optimise over params which implicitly affect the loss and are not a direct input to the loss function?