Use Pytorch NN as optimiser: freeze weights and optimise over input

I am trying to make a simple regressor work as a function optimiser, for didactic purposes.
I found some posts on the topic, e.g. here but I eventually got stuck.
I attach a MWE below.
I get a dummy dataset, and I train a simple regressor.

Then, I (tried) to freeze the weights, and optimise over the inputs, after setting requires_grad = True. I simply define the loss as the model prediction, and try to propagate backwards. I am not sure though, how to save the inputs, to re-use them as a starting point for the next iteration.

I am also unsure on how/if to add inputs to the optimiser, with optimizer.add_param_group , as I do get an error (see please below).

Here the simplest example. the function is a simple quadratic, the minimum is then a null tensor.

import copy
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import tqdm
from sklearn.model_selection import train_test_split

### Create dummy dataset
X = np.random.rand(1000,5)
y = np.apply_along_axis(lambda x: x[0]**2 + x[1]**2 , axis = 1, arr = X)[:, None]
X, y = torch.tensor(X, dtype=torch.float32), torch.tensor(y, dtype=torch.float32)

# train-test split for model evaluation
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.85, shuffle=True)

model = nn.Sequential(
    nn.Linear(5, 8),
    nn.ReLU(),
    nn.Linear(8, 12),
    nn.ReLU(),
    nn.Linear(12, 6),
    nn.ReLU(),
    nn.Linear(6, 1)
)

# loss function and optimizer
loss_fn = nn.MSELoss()  # mean square error
optimizer = optim.Adam(model.parameters(), lr=0.0001)

n_epochs = 50  # number of epochs to run
batch_size = 1  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size)

# Hold the best model
best_mse = np.inf   # init to infinity
best_weights = None
history = []

###
### TRAINING
###
for epoch in range(n_epochs):
    model.train()
    with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
        bar.set_description(f"Epoch {epoch}")
        for start in bar:
            # take a batch
            X_batch = X_train[start:start+batch_size]
            y_batch = y_train[start:start+batch_size]
            # forward pass
            y_pred = model(X_batch)
            loss = loss_fn(y_pred, y_batch)
            # backward pass
            optimizer.zero_grad()
            loss.backward()
            # update weights
            optimizer.step()
            # print progress
            bar.set_postfix(mse=float(loss))


    #### OPTIMISE OVER INPUT

# Freeze weights
for param in model.parameters():
    param.requires_grad = False
# Input to optimise, guess vaue for first iteration
X_0 = X_train[0]
# Set flag requires_grad to True
X_0.requires_grad = True

### Input optimisations

INPUT_OPTIMISATION_ITER = 200
for epoch in range(INPUT_OPTIMISATION_ITER):
    model.train()
    with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
        bar.set_description(f"Epoch {epoch}")
        for start in bar:
            # take a batch
            # forward pass
            y_pred = model(X_0)
            ### loss = loss_fn(y_pred, y_batch)
            loss = y_pred
            # backward pass
            optimizer.zero_grad()
            loss.backward(retain_graph=True)
            # update weights
            optimizer.step()
            # print progress
            bar.set_postfix(mse=float(loss))

Assuming the above is correct, where do I find the actual optimised inputs at the end of the loop, to re-sue them for the next iteration?
I am also unsure on how to add inputs to the optimiser, with optimizer.add_param_group perhaps, is this needed at all??
If I do

optimizer.add_param_group({"params": X_0})

I get the error

ValueError: some parameters appear in more than one parameter group

Thanks

Hi Michael!

I haven’t looked at your code in detail (and I don’t really know what you’re trying to
do), but something along these lines should be possible.

I would try something like:

X_0 = X_train[0].clone()              # you may or may not care, but clone() prevents modifications to X_0 from being reflected in X_train
X_0.requires_grad = True              # turns X_0 into a trainable tensor
opt = optim.Adam ([X_0], lr=0.0001)   # this is the key ingredient you seem to be missing
for  _ in range (1000):               # optimization loop for X_0
    loss = y_pred                     # this choice of loss probably doesn't make sense
    opt.zero_grad()                   # zeros out X_0.grad
    loss.backward()                   # you do NOT want retain_graph = True
    opt.step()                        # updates value of X_0

As noted in the code above, this loss probably doesn’t make sense. The optimization
of X_0 will (depending on the current frozen values of the parameters of model)
simply drive X_0 so that y_pred becomes increasing large and negative.

X_0 is the value of the input and is simply updated in place. When the next iteration
of the optimization loop runs, the new value of X_0 will be used.

Best.

K. Frank

Thank you very much for this.

I have to admit, I am not entirely sure I got it working.
I am applying the procedure to the simplest toy problem, a quadratic function in two dimensions.
Once a model is trained, I would like to use it to find the input for which the outputs is minimum, that is, to use the optimiser “backwards”, with fixed weights and free inputs.

I have incorporated your suggestions below, but still after a ton of iteration the result is quite far from the (0.0, 0.0) expected result (the minimum of a quadratic function). To make the example less silly, I use 5 dimensions but only the first two are used to compute the output.

import copy
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import tqdm
from sklearn.model_selection import train_test_split

## Get Training set
N_FEATURES = 5
X = np.random.rand(1000,N_FEATURES)-0.5
## Using only first two dimensions to compute y
y = np.apply_along_axis(lambda x: x[0]**2 +x[1]**2 + x[2]**2 , axis = 1, arr = X)[:, None]
X, y = torch.tensor(X, dtype=torch.float32), torch.tensor(y, dtype=torch.float32)

############## TRAIN MODEL


# train-test split for model evaluation
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.65, shuffle=True)

model = nn.Sequential(
    nn.Linear(N_FEATURES, 8),
    nn.ReLU(),
    nn.Linear(8, 12),
    nn.ReLU(),
    nn.Linear(12, 6),
    nn.ReLU(),
    nn.Linear(6, 1)
)
# loss function and optimizer
loss_fn = nn.MSELoss()  # mean square error
optimizer = optim.Adam(model.parameters(), lr=0.0001)

n_epochs = 50  # number of epochs to run
batch_size = 1  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size)

# Hold the best model
best_mse = np.inf   # init to infinity
best_weights = None
history = []


for epoch in range(n_epochs):
    model.train()
    with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
        bar.set_description(f"Epoch {epoch}")
        for start in bar:
            # take a batch
            X_batch = X_train[start:start+batch_size]
            y_batch = y_train[start:start+batch_size]
            # forward pass
            y_pred = model(X_batch)
            loss = loss_fn(y_pred, y_batch)
            # backward pass
            optimizer.zero_grad()
            loss.backward()
            # update weights
            optimizer.step()
            # print progress
            bar.set_postfix(mse=float(loss))
    # evaluate accuracy at end of each epoch
    model.eval()
    y_pred = model(X_test)
    mse = loss_fn(y_pred, y_test)
    mse = float(mse)
    history.append(mse)
    if mse < best_mse:
        best_mse = mse
        best_weights = copy.deepcopy(model.state_dict())

##########. FIND OPTIMIMUM INPUT

for param in model.parameters():
    param.requires_grad = False

X_0 = X_train[0].clone()              # Cloning one training row as first guess for my gradient-descent optimiser
X_0.requires_grad = True              # turns X_0 into a trainable tensor
optimizer = optim.Adam ([X_0], lr=0.0001)
    
INPUT_OPTIMISATION_ITER = 500
for epoch in range(INPUT_OPTIMISATION_ITER):
    model.train()
    #optimizer = optim.Adam ([X_0], lr=0.0001)
    with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
        bar.set_description(f"Epoch {epoch}")
        for start in bar:
            # take a batch
            # forward pass
            y_pred = model(X_0)
            ### loss = loss_fn(y_pred, y_batch)
            loss = y_pred
            # backward pass
            optimizer.zero_grad()
            loss.backward(retain_graph=True)
            # update weights
            optimizer.step()
            # print progress
            bar.set_postfix(mse=float(loss))



It runs without errors, but the final result is
tensor([ 0.0408, 0.2135, -0.1983, -0.0120, -0.1999], requires_grad=True)

The first two components being well far from 0, as the minimum should be.