Set constraints on parameters or layers

Shen · August 21, 2018, 8:28pm

Hi, are there any ways in Pytorch to set the range of parameters or values in each layer? For example, is it able to constrain the range of the linear product Y = WX to [-1, 1]? If not, how about limiting the range of the weight?

I noticed in Karas, user can define this by setting constraints. Any equivalence in Pytorch?
Thanks!

Ranahanocka · August 21, 2018, 9:14pm

You can just clip the weights of the parameters after each optimization update.

class WeightClipper(object):

    def __init__(self, frequency=5):
        self.frequency = frequency

    def __call__(self, module):
        # filter the variables to get the ones you want
        if hasattr(module, 'weight'):
            w = module.weight.data
            w = w.clamp(-1,1)


model = Net()
clipper = WeightClipper()
model.apply(clipper)

created with inspiration from this post

eshgovil · March 13, 2019, 7:11am

I was trying to place explicit weight constraints on my network layers using your suggested approach, I found that nothing was changing until I added this line (correct me if I’m wrong):

def __call__(self, module):
    # filter the variables to get the ones you want
    if hasattr(module, 'weight'):
        w = module.weight.data
        w = w.clamp(-1,1)
        module.weight.data = w

Just a heads up for anyone trying to implement this in the future!

bf_Lee · June 24, 2019, 7:00pm

Have you solved it? How to place the explicit constraints only for the weights of a specified layer?

bf_Lee · June 24, 2019, 7:02pm

Hi, I have the similar problem. Your code is to add constraints to the whole net, but I want to add the same constraint for the parameters of a specified layer(e.g., the last layer). Can you give me some suggestions?

Shen · September 30, 2019, 8:24pm

Hi, I tried the method you suggested. However, it seems module doesn’t have the attribute weight (module.weight returns a None). Have you encountered this issue? Thx

anantguptadbl · November 6, 2019, 6:50am

@bf_Lee

I will take an example

class Model(nn.Module):
    def __init__(self):
        super(Model,self).__init__()
        self.l1=nn.Linear(100,50)
        self.l2=nn.Linear(50,10)
        self.l3=nn.Linear(10,1)
        self.sig=nn.Sigmoid()
    
    def forward(self,x):
        x=self.l1(x)
        x=self.l2(x)
        x=self.l3(x)
        x=self.sig(x)
        return(x)

class weightConstraint(object):
    def __init__(self):
        pass
    
    def __call__(self,module):
        if hasattr(module,'weight'):
            print("Entered")
            w=module.weight.data
            w=w.clamp(0.5,0.7)
            module.weight.data=w

# Applying the constraints to only the last layer
constraints=weightConstraint()
model=Model()
model._modules['l3'].apply(constraints)

Hope this helps

Debajyoti_Majumdar · May 29, 2020, 11:11am

Is there a way to restrict the values for torch.nn.Parameter?
Since we cannot use apply(fn) for this kind of object.

Aryan_Asadian · October 23, 2020, 1:23pm

You can limit your parameter by feed it as input to a function, e.g., sigmoid.

my_param= nn.Parameter(torch.empty(1).cuda(), requires_grad=True)
my_param_limited = torch.sigmoid(my_param)

Note the difference in the names of the parameters, since using a single name, changes the computation graph, and make backpropagation impossible.

ptrblck · October 24, 2020, 8:03am

The output of torch.sigmoid will create a non-leaf tensor and you will use the nn.Parameter property, so I would recommend to apply the sigmoid on the tensor before wrapping it into the nn.Parameter (unless you want exactly this behavior).

Nit: torch.empty will use uninitialized memory and the tensor might thus contain invalid values such as NaNs/Infs. torch.sigmoid(NaN) would also output a NaN value, so you should initialize it somehow e.g. using rand(n).

Aryan_Asadian · October 24, 2020, 11:24am

Thank you @ptrblck for your reply. But When I apply sigmoid before wrapping my variable to nn.Parameter, after a few epochs, my parameter violates its range, i.e., [0,1]. How can I handle that?

my_param= nn.Parameter(torch.sigmoid(torch.rand(1)).cuda(), requires_grad=True)

googlebot · October 24, 2020, 1:02pm

It is impossible to declare a constrained parameter in pytorch. So, in __init__ an unconstained parameter is declared, e.g.:
self.my_param = nn.Parameter(torch.zeros(1))

And in forward(), you do the transformation:
my_param_limited = torch.sigmoid(my_param)

this is a dynamically created tensor stored in a local variable

Aryan_Asadian · October 24, 2020, 2:52pm

Thank you for your reply. the transformation that you did in forward() as @ptrblck mentioned, makes it a non-leaf tensor. I am wondering how can we learn a parameter which should be constrained. For example, we have a custom loss function which is the combination of cross-entropy and MSE loss.

def __init__():
           my_param= nn.Parameter(torch.empty(1).cuda(), requires_grad=True)

def forward():
          gamma = torch.sigmoid(my_param)

         total_loss = gamma * cross_entropy_loss + (1- gamma) * MSE()
         return total_loss

So in this way, my_param can not be trained meaningfully. right?

googlebot · October 24, 2020, 3:52pm

Built-in optimizers have no idea about constraints, so any other solution would be more cumbersome.

You can improve things a bit:

def __init__(self):
    self.my_param= nn.Parameter(torch.logit(torch.tensor([0.5])) #inverse of sigmoid
@property
def gamma(self):
    return self.my_param.sigmoid()

though I don’t like uncached @property there

PS: cuda() can be omitted and done on a module, and requires_grad=True is unnecessary

Y_Y · May 22, 2021, 4:30am

Could use module.weight.data.clamp_(-clip,clip)

Jonas_ar · March 16, 2022, 10:30am

I want to do the same with a specific layer for the MobileNetV3. But I stuggle with finding out the right module name, where to apply my defined constraints.

As an example:
This is the first Inverted Residual block from the MobileNetV3, I only want to set constraints for the Conv2D layer.

(1): InvertedResidual(
(block): Sequential(
(0): ConvNormActivation(
(0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=16, bias=False)
(1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(1): ConvNormActivation(
(0): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)

So basically my “model._modules[‘l3’].apply()” in your example should say something like
model._modules[‘conv2d layer in (block)sequential in (0)ConvNormActivation’].apply().

I tried to use:

for name, module in model.named_modules():
print(name)

for the module name. But the module name “features.1.block.0.0” for the weights I consequently want to constrain (Weightname:"features.1.block.0.0.weight ") throws a “KeyError”.

model._modules[‘features.1.block.0.0’].apply()

I don’t want to constrain all weights within the model to the same boundaries, since the value range is quite different in some layers and therefore I would cut of some weights after training if I would not use different boundaries for different layers.

For reference, I am injecting a BitFlip into the weights and therefore want to constrain the weights to the maximum and minimum values after training. So if a bitflip occurs the value does not go beyond the max and min value, and relies on the inherit fault resilience of DNN models for small value changes.

Any help would be greatly appreciated!

Jonas_ar · March 17, 2022, 1:26pm

New update in that regard, accessing the first features “features.0” works with

print(model._modules[‘features’][0])

with the output:

ConvNormActivation(
(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): Hardswish()
)

where when I use

print(model._modules[‘features’][0][0])

only accesses the Conv2D, which is what I want.

But for the second feature “features.1” the same syntax does not work since this feature is a InvertedResidual and I get the error “‘InvertedResidual’ object” is not subscriptable.

print(model._modules[‘features’][1])

InvertedResidual(
(block): Sequential(
(0): ConvNormActivation(
(0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=16, bias=False)
(1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(1): ConvNormActivation(
(0): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
)
)