Hello,

I have a generative model that is learning a latent variable using backprop as part of the generation process. For each sample there is a custom latent variable and they have nothing to do with each other. The only reason I use batches with more than one sample is to speed things up.

I want to find out if the performance can be increased by using a learning rate scheduler. However, since the samples are independent, I need one scheduler for each sample. In order to do this, I probably need one parameter group for each sample. Let’s say I have 512 samples in my batch. I have tried the following:

```
param_groups = [{"params": [latent[i]]} for i in range(latent.shape[0])]
opt = optim.Adam(params=param_groups)
```

When running this I get

ValueError: can’t optimize a non-leaf Tensor

which makes sense because the original tensor is `latent`

and I am slicing it, so the slice is no leaf anymore. So I tried this instead:

```
param_groups = [{"params": [latent[i].detach().requires_grad_(True)]} for i in range(latent.shape[0])]
opt = optim.Adam(params=param_groups)
```

This solves the non-leaf issue, but (of course) the gradient is not able to flow back to the original latent vector anymore. So when I call this in my backprop loop, there is no gradient to the original tensor.

```
generated = self.do_some_fancy_stuff(latent)
loss = my_loss(generated, groundtruth)
loss.backward()
opt.step()
```

I could pass each sample seperately through the model and backprop the loss to the slice of the latent tensor, but that is much slower.

So my question is: Is there some way to achieve that the slice `latent[i]`

shares the gradient with the corresponding vector in `latent`

?

I have also tried this:

```
loss.backward()
for i in range(latent.shape[0]):
opt.param_groups[i]["params"][0].grad = latent.grad[i]
```

The idea is to calculate the gradient on the whole batch and then to set the gradient of the slices manually, but then I get this when calling opt.zero_grad():

RuntimeError: Can’t detach views in-place. Use detach() instead

So I tried to replace zero_grad by manually setting `grad = None`

:

```
for i in range(latent.shape[0]):
opt.param_groups[i]["params"][0].grad = None # manual zero_grad
latent.grad = None # manual zero_grad
loss.backward()
for i in range(latent.shape[0]):
opt.param_groups[i]["params"][0].grad = latent.grad[i]
```

The code runs without errors, but the results are very bad. There seems to be still an issue with the gradients, but I don’t know what is going wrong.

Or is there any other way to have a custom learning rate for each sample in a batch?