Implementation of SWISH : a self-gated activation function

A new activation function named “swish” came out and I tried to make a custom layer according to this( example and the paper(

Is this a proper way of making a custom activation function?

Class Swish(Function):
    def forward(ctx, i):
        result = i*i.sigmoid()
        return result

    def backward(ctx, grad_output):
        result,i = ctx.saved_variables
        sigmoid_x = i.sigmoid()
        return grad_output * (result+sigmoid_x*(1-result))

swish= Swish.apply

class Swish_module(nn.Module):
    def forward(self,x):
        return swish(x)
swish_layer = Swish_module()

I find it simplest to use activation functions in a functional way. Then the code can be

def swish(x):
    return x * F.sigmoid(x)

I doubt if it’s the most memory efficient implementation present right now.

What is for F… what module that have to be imported as F. Can you please tell here?

@Md_Mahfujur_Rahman_0 look at some code samples in and you will find F

Actually,there is another learnable Activation function in the paper:Swish-β=x · σ(βx)。Coud you please respective implementation it in:channel-shared,channel-wise,element-wise forms,I found it difficult to implementation.thank you!

F stands for torch.nn.functional