How to do projected gradient descent?

Hi, I want to do a constrained optimization with PyTorch. I want to find the minimum of a function $f(x_1, x_2, \dots, x_n)$, with \sum_{i=1}^n x_i=5 and x_i \geq 0. I think this could be done via Softmax. So I follow the How to do constrained optimization in PyTorch

import torch
from torch import nn

x = torch.rand(2)
x.requires_grad = True
lin = nn.Linear(2, 1)
optimizer = torch.optim.Adam([x], lr=0.1)
for i in range(100):
    optimizer.zero_grad()
    y = lin(x)
    y.backward()
    optimizer.step()

    with torch.no_grad():
        x = nn.Softmax(dim=-1)(x) * 5

If print(y) in each step,the output is:

tensor([-0.2826], grad_fn=<AddBackward0>)
tensor([-0.9759], grad_fn=<AddBackward0>)
tensor([-0.9794], grad_fn=<AddBackward0>)
tensor([-0.9880], grad_fn=<AddBackward0>)
tensor([-1.0062], grad_fn=<AddBackward0>)
tensor([-1.0284], grad_fn=<AddBackward0>)
tensor([-1.0360], grad_fn=<AddBackward0>)
tensor([-1.0368], grad_fn=<AddBackward0>)
tensor([-1.0368], grad_fn=<AddBackward0>)
tensor([-1.0368], grad_fn=<AddBackward0>)
tensor([-1.0368], grad_fn=<AddBackward0>)
tensor([-1.0368], grad_fn=<AddBackward0>)
...

The following value does not change.

for i in lin.parameters():
    print(i)

The weight of the linear layer is

Parameter containing:
tensor([[-0.2240, -0.1983]], requires_grad=True)
Parameter containing:
tensor([0.0823], requires_grad=True)

So the formula is $y=-0.224x_1-0.1983x_2+0.0823=-0.0257*x_1-0.9092$,whose minimum should be -1.0377. Is the above code wrong? How to modify? Thank you very much.

Hi,

Is there a reason you use softmax instead of clamp? That is not doing the projection of pgd right?

Hiiiii Sakuraiiiii!

The short answer is that Softmax isn’t the right way to enforce your
constraint.

The medium-short answer is that you’re getting almost the right
answer, but for the wrong reason.

The somewhat longer short answer is that Softmax isn’t the right
way to enforce your constraint.

Some details:

Because you’re minimizing a linear function with respect to linear
constraints, the minimum will occur when the constraints are
saturated.

Specifically, as you have worked out, the minimum occurs when
x_1 = 5.0 and x_2 = 0.0, and the minimum is -1.0377.

The short-answer details are that your constraint enforcement
doesn’t map your solution back to itself:

5.0 * softmax ([5.0, 0.0]) = [4.9665, 0.0335]

and if you iterate 5.0 * softmax() several times, you get
[4.9641, 0.0359]. The value of your function for this value
of (x_1, x_2) is -1.03678, as returned by your training.

Each step of your training is, in effect, iterating your constraint
enforcement. (For reasons I won’t go into, your constraint
enforcement pretty much washes away the changes in x_1
and x_2 made by your optimizer step.)

Why are you getting almost the right answer? Softmax pushes
one value of x_i (the largest) towards 1, and the others towards
0. This is where your constraints are saturated, and therefore where
the minimum lies. That is, in your two-variable case, 5 * Softmax
is pushing x_1 close to 5 (specifically to 4.9641), and x_2 close
to 0.

The longer answer:

Even though 5 * Softmax does enforce your constraint (variables
are greater than 0 and sum to 5), it is, in general, not the right way
to enforce such constraints.

The problem (among other things) is that applying Softmax a second
time changes the x_i a second time. If your constraint is already
satisfied, reapplying the constraint shouldn’t changes things.

Related to this, if you’re at the correct minimum ([5, 0]) applying
the constraint moves you away from the minimum (even though
the constraint was properly satisfied). This clearly isn’t what you
want. (It’s only by luck that because of the specific function you
were minimizing you got almost the right answer using Softmax.)

To enforce your constraint, I would recommend something like:

x_i = (5.0 - sum (x_i)) / n
x_i = max (0.0, x_i)
x_i = (5.0 / sum (x_i)) * x_i

(As an aside, you can also add terms to your loss function that push
the x_i in the direction of satisfying your constraints, and in the
case of your equality constraint, sum (x_i) = 5, you could add
a Langrange-multiplier term to your loss function to enforce the
equality constraint.)

Good luck.

K. Frank

2 Likes

Thank you very much. That’s very clear.