How to do constrained optimization in PyTorch

TinfoilHat0 · November 6, 2019, 12:46am

What’s the proper way to do constrained optimization in PyTorch?

For example, I want each parameter of my model to be bounded both from above and below by some constants cLow and cHigh.

That is, if W is the d-dimensional (flattened) weight vector of my model, I’d like to enforce
cLow < W[i] < cHigh for i = 1, 2, … d. How can I do that?

crowsonkb · November 6, 2019, 1:16am

You can do projected gradient descent by enforcing your constraint after each optimizer step. An example training loop would be:

    opt = optim.SGD(model.parameters(), lr=0.1)
    for i in range(1000):
        out = model(inputs)
        loss = loss_fn(out, labels)
        print(i, loss.item())
        opt.zero_grad()
        loss.backward()
        opt.step()
        with torch.no_grad():
            for param in model.parameters():
                param.clamp_(-1, 1)

The last three lines enforce the constraint that the weights fall in the range -1–1.

VIVEK_RUHELA · November 25, 2019, 9:15am

I am also working on constraints optimization problems. My experience with this suggestion is not positive. When I don’t use clamp_() and train model with no restriction then value of specific weights of interested is close to desired and model is predicting good results. But after using clamp_(), model performance degrade severely. What can be the possible reason for this.

oricha · August 20, 2021, 7:30pm

One reason is that this clamping is not communicated to the optimizer, and in particular destroys the gradients. So the optimizer falsely believes that it has moved the parameter in a certain direction, when in fact it is clamped to the same value as before.

Max_Unhold · November 2, 2024, 11:27am

@oricha Did you find any solution to that problem?

Grant_Norman · November 19, 2024, 9:57pm

I’ve worked quite a bit on combining PyTorch with constrained optimization. On one hand, my constraints are nonlinear functions of the parameters, as opposed to bound constraints here. The temporary solution I decided upon was using SciPy’s optimize routines, and using PyTorch to compute gradients.

The code for this was part of a more general paper, but the main function is here.

Another option would be a similar (more professional) wrapper of PETSc: