Need help on implementing proximal operator

I plan to implement this paper in pytorch: Learning to Share. I need some advice on how to properly implement the proximal operator. It seems relate to optimizer in pytorch. Here is the equation for gradient update in this paper:

For more detail, you can check the paper. Currently, I have no clue how to do it as the gradient computation is wrapped in a proximal gradient operator.

1 Like

As far as I understand, you don’t actually modify the objective function to minimize, but rather change the update rule (the lower rule basically is SGD for b, the upper now has the prox instead of “just updating” W as in SGD). Thus you would not need the gradient of prox.

I think they have code up, too, though I didn’t look at it in detail, but it seems vaguely consistent with the above interpretation:

Best regards


Hi, Thomas
Thanks for your reply. Can I separate the update into two parts? First, use torch optimizer to update the weights as regular training process. Then, send the weights to proximal operator, and reset the weights of the model by using the results of proximal operator. Is this process seems reasonable to you?

Though I think It might be slower than directly modify the update rule in SGD.


Yes, the easiest probably is to pass a subset of your parameters to a stock SGD (the ones just needing plain SGD optimizer) and have a for loop over the others and do the update. By the time you have the proximal operator, it’ll be the easy part. :slight_smile:

All the best for your project, it certainly seems to be very interesting!

Best regards