Use Pytorch NN as optimiser: freeze weights and optimise over input

I am trying to make a simple regressor work as a function optimiser, for didactic purposes.
I found some posts on the topic, e.g. here but I eventually got stuck.
I attach a MWE below.
I get a dummy dataset, and I train a simple regressor.

Then, I (tried) to freeze the weights, and optimise over the inputs, after setting requires_grad = True. I simply define the loss as the model prediction, and try to propagate backwards. I am not sure though, how to save the inputs, to re-use them as a starting point for the next iteration.

I am also unsure on how/if to add inputs to the optimiser, with optimizer.add_param_group , as I do get an error (see please below).

Here the simplest example. the function is a simple quadratic, the minimum is then a null tensor.

import copy
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import tqdm
from sklearn.model_selection import train_test_split

### Create dummy dataset
X = np.random.rand(1000,5)
y = np.apply_along_axis(lambda x: x[0]**2 + x[1]**2 , axis = 1, arr = X)[:, None]
X, y = torch.tensor(X, dtype=torch.float32), torch.tensor(y, dtype=torch.float32)

# train-test split for model evaluation
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.85, shuffle=True)

model = nn.Sequential(
    nn.Linear(5, 8),
    nn.ReLU(),
    nn.Linear(8, 12),
    nn.ReLU(),
    nn.Linear(12, 6),
    nn.ReLU(),
    nn.Linear(6, 1)
)

# loss function and optimizer
loss_fn = nn.MSELoss()  # mean square error
optimizer = optim.Adam(model.parameters(), lr=0.0001)

n_epochs = 50  # number of epochs to run
batch_size = 1  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size)

# Hold the best model
best_mse = np.inf   # init to infinity
best_weights = None
history = []

###
### TRAINING
###
for epoch in range(n_epochs):
    model.train()
    with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
        bar.set_description(f"Epoch {epoch}")
        for start in bar:
            # take a batch
            X_batch = X_train[start:start+batch_size]
            y_batch = y_train[start:start+batch_size]
            # forward pass
            y_pred = model(X_batch)
            loss = loss_fn(y_pred, y_batch)
            # backward pass
            optimizer.zero_grad()
            loss.backward()
            # update weights
            optimizer.step()
            # print progress
            bar.set_postfix(mse=float(loss))


    #### OPTIMISE OVER INPUT

# Freeze weights
for param in model.parameters():
    param.requires_grad = False
# Input to optimise, guess vaue for first iteration
X_0 = X_train[0]
# Set flag requires_grad to True
X_0.requires_grad = True

### Input optimisations

INPUT_OPTIMISATION_ITER = 200
for epoch in range(INPUT_OPTIMISATION_ITER):
    model.train()
    with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
        bar.set_description(f"Epoch {epoch}")
        for start in bar:
            # take a batch
            # forward pass
            y_pred = model(X_0)
            ### loss = loss_fn(y_pred, y_batch)
            loss = y_pred
            # backward pass
            optimizer.zero_grad()
            loss.backward(retain_graph=True)
            # update weights
            optimizer.step()
            # print progress
            bar.set_postfix(mse=float(loss))

Assuming the above is correct, where do I find the actual optimised inputs at the end of the loop, to re-sue them for the next iteration?
I am also unsure on how to add inputs to the optimiser, with optimizer.add_param_group perhaps, is this needed at all??
If I do

optimizer.add_param_group({"params": X_0})

I get the error

ValueError: some parameters appear in more than one parameter group

Thanks

Hi Michael!

I haven’t looked at your code in detail (and I don’t really know what you’re trying to
do), but something along these lines should be possible.

I would try something like:

X_0 = X_train[0].clone()              # you may or may not care, but clone() prevents modifications to X_0 from being reflected in X_train
X_0.requires_grad = True              # turns X_0 into a trainable tensor
opt = optim.Adam ([X_0], lr=0.0001)   # this is the key ingredient you seem to be missing
for  _ in range (1000):               # optimization loop for X_0
    loss = y_pred                     # this choice of loss probably doesn't make sense
    opt.zero_grad()                   # zeros out X_0.grad
    loss.backward()                   # you do NOT want retain_graph = True
    opt.step()                        # updates value of X_0

As noted in the code above, this loss probably doesn’t make sense. The optimization
of X_0 will (depending on the current frozen values of the parameters of model)
simply drive X_0 so that y_pred becomes increasing large and negative.

X_0 is the value of the input and is simply updated in place. When the next iteration
of the optimization loop runs, the new value of X_0 will be used.

Best.

K. Frank