Reusing Tensor and Optimizer in nested loops

atouchet · November 7, 2019, 12:01am

My question pertains to the implementation of adversarial attacks that used Adam or SGD optimizers on the perturbation tensor. In particular, attacks such as Carlini-Wagner (advertorch implementation). Attacks that require two levels of optimization have the general structure:

for outer_step in range(outer_max):
    delta = nn.Parameter(torch.zeros_like(x))
    optimizer = optim.SGD([delta], lr=0.01)
    for inner_step in range(inner_max):
        # Some stuff
        loss.backward()
        optimizer.step()

I would like to do the following instead:

# Initialize a tensor and its optimizer outside the loop
delta = torch.zeros_like(x,requires_grad=True)
optimizer = ch.optim.SGD([delta],lr=.001)
for outer_step in range(outer_max):
    # Zero the tensor instead of creating new one
    with ch.no_grad():
        delta.data.sub_(delta)
    for inner_step in range(inner_max):
        # Some stuff
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

I’ve done basic testing successfully, but to be honest I’m not sure under which considers this might break.

Does anyone have any suggestions on if this is safe to do, and if not is there any alternative that still uses optimizers?

albanD · November 7, 2019, 3:17pm

You should not (never) use .data. The with torch.no_grad() is good here.

This looks good to me. Keep in mind though that if you use optimizers that have states (sgd with momentum, adam, etc) these states will persist from one inner loop to the other.

atouchet · November 7, 2019, 4:05pm

I originally had resorted to using .data because of the error a leaf Variable that requires grad has been used in an in-place operation. But I see that with no_grad() takes of this problem!

As for the states of the optimizer, that is another challenge I’m working on by building optimizer specific to attacks. Which is only tangentially related to this topic.

Thanks!