Custom activation functions with trainable parameters

Jamesswiz · July 31, 2019, 12:07pm

I am trying to implement this paper https://arxiv.org/pdf/1801.09403.pdf
The idea is to define a custom activation function using basic functions like relu, tanh etc with affine or convex constraints.

Here is a class structure I created but can’t make it work

 class combine_act(nn.Module):
      def __init__(self, in_features,act,comb='affine'):
           super(combine_act,self).__init__()
  
           self.in_features = in_features
           self.act = act                       #list of activations to use
           self.comb = comb                #type of combination [affine or convex]
           self.N = len(self.act)      
  
           self.alpha = nn.Parameter(torch.randn(len(act)))
           self.alpha.requiresGrad = True
  
  
           Lin = lambda x: x
  
           activations = {'linear': Lin, 'relu': nn.ReLU(), 'tanh': nn.Tanh()}
  
  
           def my_softmax(self, x):
                 means = torch.mean(x, 1, keepdim=True)[0]
                 x_exp = torch.exp(x-means)
                 x_exp_sum = torch.sum(x_exp, 1, keepdim=True)
                 return x_exp/x_exp_sum
  

           
  
           def get_alpha(self):
               self.eps = 1e-7
               if comb=='affine':
               return self.alpha / self.alpha.sum(1, keepdim=True).clamp(min=self.eps)
           else:
               return self.my_softmax(self.alpha)     

    
           def forward(self, x):
               out=x-x 
               weight = self.get_alpha()
               for i in range(self.N):
                    activation = activations[self.act[i]]
                    out += weight[i]*activation(x)
               return out

Call to this function would look like this:
x = nn.Linear(100,100)(x)
x = combine_act([‘linear’, ‘relu’, ‘tanh’],comb=‘affine’)(x)

I am getting this error: ‘combine_act’ object has no attribute ‘dim’

Thanks in advance?

ptrblck · August 1, 2019, 5:34pm

It seems you’ve posted an older code, as it’s currently not executable, e.g. self.eps is defined outside of a method, you are using type=affine which is also undefined. Could you post the current code again?

Jamesswiz · August 2, 2019, 8:21am

@ptrblck Hi, Please see the modified code.

Jamesswiz · August 2, 2019, 1:28pm

class combine_act(nn.Module):
     def __init__(self,in_features,comb='convex'):
          super(combine_act, self).__init__()
    
    self.in_features = in_features
    self.comb = comb                     #type of combination [affine or convex]
    
    Lin = lambda x: x

    self.activations = {Lin, nn.ReLU(), nn.Tanh()}           #list of activations to use 

    self.alpha = nn.Parameter(torch.randn(len(self.activations)))
    self.alpha.requiresGrad = True
    
    


    def my_softmax(self, x):
      means = torch.mean(x)
      x_exp = torch.exp(x-means)
      x_exp_sum = torch.sum(x_exp)
      return x_exp/x_exp_sum

    def unitnorm(self,x):
      eps = 1e-7
      return x / x.sum().clamp(min=eps)

   def get_alpha(self,a):
      if self.comb=='affine':
         return self.unitnorm(a).cuda()
      else:
         return self.my_softmax(a).cuda()

   def forward(self, x):
      out=x-x 
      weight = self.get_alpha(self.alpha)
      for i, activation in enumerate(self.activations):
         out += weight[i]*activation(x)
         return out

@ptrblck I have a working prototype now. I have to do some extensive tests. (see code above)

But there still remains an issue with the update of weights.
In forward function I am computing weights from trainable parameter alpha. However, What I want is to update alpha directly i.e., do gradient update on alpha via autograd and then rescale values according to convex/affine constraints so that I have the correct model parameter.

Any suggestions?

Thanks

ptrblck · August 2, 2019, 1:38pm

self.act is still undefined? Should I replace it with self.activations?
Would the code to execute this module be x = combine_act(100, actcomb='affine')(x)?

Jamesswiz · August 2, 2019, 1:46pm

@ptrblck sorry for the bug. Yes your are right, and I have updated the code now too.

There is no need of self.act now. I will fix that with a named dictionary later. Right now I am hard coding the activations I want to use in self.activations.
Here is how you would call it

x = nn.linear(100,100)(x)
x = combine_act(100,comb='affine')(x)

ptrblck · August 2, 2019, 1:58pm

Thanks for the code.
It seems alpha gets valid gradients (they look pretty high, but I’m not sure, if that’s desired in your use case).

class combine_act(nn.Module):
    def __init__(self,in_features,comb='convex'):
        super(combine_act, self).__init__()
    
        self.in_features = in_features
        self.comb = comb                     #type of combination [affine or convex]
    
        Lin = lambda x: x

        self.activations = {Lin, nn.ReLU(), nn.Tanh()}           #list of activations to use 

        self.alpha = nn.Parameter(torch.randn(len(self.activations)))

    def my_softmax(self, x):
        means = torch.mean(x)
        x_exp = torch.exp(x-means)
        x_exp_sum = torch.sum(x_exp)
        return x_exp/x_exp_sum

    def unitnorm(self,x):
        eps = 1e-7
        return x / x.sum().clamp(min=eps)

    def get_alpha(self,a):
        if self.comb=='affine':
           return self.unitnorm(a).cuda()
        else:
           return self.my_softmax(a).cuda()

    def forward(self, x):
        out=x-x 
        weight = self.get_alpha(self.alpha)
        for i, activation in enumerate(self.activations):
           out += weight[i]*activation(x)
        return out


x = torch.randn(1, 100).cuda()    
model = combine_act(100, comb='affine').cuda()

x = model(x)
x.mean().backward()
print(model.alpha.grad)

You could simply pass alpha to an optimizer, which would perform the update step.
Could you clarify the use case a bit, if that wouldn’t work?

Jamesswiz · August 2, 2019, 2:22pm

@ptrblck alpha is parameter of a custom activation function, and is different each time I use it in multiple layers like batchnorm. How do I pass it to optimizer in my training routine? There should be away to do it automatically like linear or batchnorm layer where we do loss.backward followed by optimizer.step() and all parameters gets updated?

My next query is:
I want to get rid of weight variable and use alpha directly. Since alpha is nn.Parameter I can’t assign it with output of get_alpha() function. masked_scatter_ gives me CUDA errors.

Thanks