Implementation of SWISH : a self-gated activation function

A new activation function named “swish” came out and I tried to make a custom layer according to this(http://pytorch.org/docs/master/notes/extending.html#extending-torch-autograd) example and the paper(https://arxiv.org/pdf/1710.05941.pdf).

Is this a proper way of making a custom activation function?

Class Swish(Function):
    @staticmethod
    def forward(ctx, i):
        result = i*i.sigmoid()
        ctx.save_for_backward(result,i)
        return result

    @staticmethod
    def backward(ctx, grad_output):
        result,i = ctx.saved_variables
        sigmoid_x = i.sigmoid()
        return grad_output * (result+sigmoid_x*(1-result))

swish= Swish.apply

class Swish_module(nn.Module):
    def forward(self,x):
        return swish(x)
    
swish_layer = Swish_module()
1 Like

I find it simplest to use activation functions in a functional way. Then the code can be

def swish(x):
    return x * F.sigmoid(x)
11 Likes

I doubt if it’s the most memory efficient implementation present right now.

What is for F… what module that have to be imported as F. Can you please tell here?

@Md_Mahfujur_Rahman_0 look at some code samples in https://github.com/pytorch/examples/ and you will find F

Actually,there is another learnable Activation function in the paper:Swish-β=x · σ(βx)。Coud you please respective implementation it in:channel-shared,channel-wise,element-wise forms,I found it difficult to implementation.thank you!

F stands for torch.nn.functional