Actually,there is another learnable Activation function in the paper:Swish-β=x · σ(βx)。Coud you please respective implementation it in:channel-shared,channel-wise,element-wise forms,I found it difficult to implementation.thank you!
@yao-ying Going by your comment I think the implementation would be something like this.
import torch.nn as nn
class learnableSwish(nn.Module):
def __init__(self):
super(learnableSwish, self).__init__()
self.beta = nn.Parameter(torch.as_tensor(0))
def forward(self,x):
x = x*nn.functional.sigmoid(self.beta*x)
return x