Differentiable Optimizer Not Working For Simple Example

import torch
import torch.nn as nn

class Net(torch.nn.Module):

    def __init__(self):
        super().__init__()
        self.device = (
        "cpu"
        )
        self.Initialize()

    def Initialize(self):
        self.FC = nn.Sequential(
                nn.LayerNorm(10),
                nn.Linear(10, 5),).to(self.device)
    
        
    def Inference(self, s):
        ou = self.FC(s)
        return ou


    def forward(self, s):
        return self.Inference(s)


model = Net()
model.train()

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, differentiable=True)

z = torch.randn((10,10))

a = model(z)

loss = (a - 1).mean()

optimizer.zero_grad()
loss.backward()
optimizer.step()

As above.

Error produced:
File “…\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\optim\adam.py”, line 413, in single_tensor_adam
param.addcdiv
_(exp_avg, denom)
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

Pytorch 2.4

This is expected! The use case of a differentiable optimizer is to usually learn the best hyperparameters for the optimizer, e.g., the LR or weight decay, and so the key is to make the parameters non-Leaf as one does that. One way to make the parameters non-Leaf is to clone them all before passing them into the optimizer.

(As an aside, PyTorch support for Tensor LR for differentiable optimizers is admittedly sparse and is now on my mind as something to improve.)