Suppose I have input tensor x in shape of [B, n_features]. Instead of optimize the parameters of nn, I freeze the nn and want to optimize only several columns of the input. Conceptually like this:
x = torch.rand(batch_size, n_features)
x.requires_grad_()
col = [0, 2] # pick out several specifical columns
opt = torch.optim.Adam(params = [x[:, col]], lr = 0.01)
for epoch in range(epoches):
opt.zero_grad()
y = model(x)
loss = Loss(x, y)
loss.backward()
opt.step()
If I directly run like above, the optimizer complains like:
ValueError: can't optimize a non-leaf Tensor
Is there a canonical way to do this?
Thanks in advance.
Because pytorch operates on entire tensors, there’s no simple efficient way to
optimize just a portion of your input tensor.
The cleanest general way to do what you want is to let pytorch optimize your
entire x and then restore the original values of the columns of x that you didn’t
want to optimize.
Something like this:
x = torch.rand(batch_size, n_features)
col = [0, 2] # pick out several specified columns
x_frozen = x[:, col].clone() # save values of columns to freeze
x.requires_grad_()
opt = torch.optim.Adam (params = [x], lr = 0.01) # optimize all of x
for epoch in range(epoches):
opt.zero_grad()
y = model(x)
loss = Loss(x, y)
loss.backward()
opt.step()
with torch.no_grad():
x[:, cols] = x_frozen # restore values of frozen columns
(As an aside, you might want to turn off requires_grad for your model
parameters. As it stands, your model won’t be optimized, but gradients
for its parameters will be accumulated by every call to loss.backward().
This costs time in the backward pass and memory in the computation graph
(which may or may not matter, depending on your use case). This also could
hypothetically lead to overflow in the .grads for your model parameters if you
train for many iterations. If this is a concern, you could call model.zero_grad().