My question pertains to the implementation of adversarial attacks that used Adam or SGD optimizers on the perturbation tensor. In particular, attacks such as Carlini-Wagner (advertorch implementation). Attacks that require two levels of optimization have the general structure:

```
for outer_step in range(outer_max):
delta = nn.Parameter(torch.zeros_like(x))
optimizer = optim.SGD([delta], lr=0.01)
for inner_step in range(inner_max):
# Some stuff
loss.backward()
optimizer.step()
```

I would like to do the following instead:

```
# Initialize a tensor and its optimizer outside the loop
delta = torch.zeros_like(x,requires_grad=True)
optimizer = ch.optim.SGD([delta],lr=.001)
for outer_step in range(outer_max):
# Zero the tensor instead of creating new one
with ch.no_grad():
delta.data.sub_(delta)
for inner_step in range(inner_max):
# Some stuff
optimizer.zero_grad()
loss.backward()
optimizer.step()
```

I’ve done basic testing successfully, but to be honest I’m not sure under which considers this might break.

Does anyone have any suggestions on if this is safe to do, and if not is there any alternative that still uses optimizers?