I am trying to implement this paper https://arxiv.org/pdf/1801.09403.pdf
The idea is to define a custom activation function using basic functions like relu, tanh etc with affine or convex constraints.
Here is a class structure I created but can’t make it work
class combine_act(nn.Module):
def __init__(self, in_features,act,comb='affine'):
super(combine_act,self).__init__()
self.in_features = in_features
self.act = act #list of activations to use
self.comb = comb #type of combination [affine or convex]
self.N = len(self.act)
self.alpha = nn.Parameter(torch.randn(len(act)))
self.alpha.requiresGrad = True
Lin = lambda x: x
activations = {'linear': Lin, 'relu': nn.ReLU(), 'tanh': nn.Tanh()}
def my_softmax(self, x):
means = torch.mean(x, 1, keepdim=True)[0]
x_exp = torch.exp(x-means)
x_exp_sum = torch.sum(x_exp, 1, keepdim=True)
return x_exp/x_exp_sum
def get_alpha(self):
self.eps = 1e-7
if comb=='affine':
return self.alpha / self.alpha.sum(1, keepdim=True).clamp(min=self.eps)
else:
return self.my_softmax(self.alpha)
def forward(self, x):
out=x-x
weight = self.get_alpha()
for i in range(self.N):
activation = activations[self.act[i]]
out += weight[i]*activation(x)
return out
Call to this function would look like this:
x = nn.Linear(100,100)(x)
x = combine_act([‘linear’, ‘relu’, ‘tanh’],comb=‘affine’)(x)
I am getting this error: ‘combine_act’ object has no attribute ‘dim’
It seems you’ve posted an older code, as it’s currently not executable, e.g. self.eps is defined outside of a method, you are using type=affine which is also undefined. Could you post the current code again?
class combine_act(nn.Module):
def __init__(self,in_features,comb='convex'):
super(combine_act, self).__init__()
self.in_features = in_features
self.comb = comb #type of combination [affine or convex]
Lin = lambda x: x
self.activations = {Lin, nn.ReLU(), nn.Tanh()} #list of activations to use
self.alpha = nn.Parameter(torch.randn(len(self.activations)))
self.alpha.requiresGrad = True
def my_softmax(self, x):
means = torch.mean(x)
x_exp = torch.exp(x-means)
x_exp_sum = torch.sum(x_exp)
return x_exp/x_exp_sum
def unitnorm(self,x):
eps = 1e-7
return x / x.sum().clamp(min=eps)
def get_alpha(self,a):
if self.comb=='affine':
return self.unitnorm(a).cuda()
else:
return self.my_softmax(a).cuda()
def forward(self, x):
out=x-x
weight = self.get_alpha(self.alpha)
for i, activation in enumerate(self.activations):
out += weight[i]*activation(x)
return out
@ptrblck I have a working prototype now. I have to do some extensive tests. (see code above)
But there still remains an issue with the update of weights.
In forward function I am computing weights from trainable parameter alpha. However, What I want is to update alpha directly i.e., do gradient update on alpha via autograd and then rescale values according to convex/affine constraints so that I have the correct model parameter.
self.act is still undefined? Should I replace it with self.activations?
Would the code to execute this module be x = combine_act(100, actcomb='affine')(x)?
@ptrblck sorry for the bug. Yes your are right, and I have updated the code now too.
There is no need of self.act now. I will fix that with a named dictionary later. Right now I am hard coding the activations I want to use in self.activations.
Here is how you would call it
x = nn.linear(100,100)(x)
x = combine_act(100,comb='affine')(x)
@ptrblck alpha is parameter of a custom activation function, and is different each time I use it in multiple layers like batchnorm. How do I pass it to optimizer in my training routine? There should be away to do it automatically like linear or batchnorm layer where we do loss.backward followed by optimizer.step() and all parameters gets updated?
My next query is:
I want to get rid of weight variable and use alpha directly. Since alpha is nn.Parameter I can’t assign it with output of get_alpha() function. masked_scatter_ gives me CUDA errors.