Implementation of SWISH : a self-gated activation function

GunHo_Choi · October 18, 2017, 4:58pm

A new activation function named “swish” came out and I tried to make a custom layer according to this(http://pytorch.org/docs/master/notes/extending.html#extending-torch-autograd) example and the paper(https://arxiv.org/pdf/1710.05941.pdf).

Is this a proper way of making a custom activation function?

Class Swish(Function):
    @staticmethod
    def forward(ctx, i):
        result = i*i.sigmoid()
        ctx.save_for_backward(result,i)
        return result

    @staticmethod
    def backward(ctx, grad_output):
        result,i = ctx.saved_variables
        sigmoid_x = i.sigmoid()
        return grad_output * (result+sigmoid_x*(1-result))

swish= Swish.apply

class Swish_module(nn.Module):
    def forward(self,x):
        return swish(x)
    
swish_layer = Swish_module()

greaber · October 19, 2017, 9:57pm

I find it simplest to use activation functions in a functional way. Then the code can be

def swish(x):
    return x * F.sigmoid(x)

kvrd18 · October 26, 2017, 2:47pm

I doubt if it’s the most memory efficient implementation present right now.

Md_Mahfujur_Rahman_0 · January 10, 2018, 4:54am

What is for F… what module that have to be imported as F. Can you please tell here?

smth · January 10, 2018, 12:28pm

@Md_Mahfujur_Rahman_0 look at some code samples in https://github.com/pytorch/examples/ and you will find F

yao-ying · February 23, 2018, 9:29am

Actually，there is another learnable Activation function in the paper：Swish-β=x · σ(βx)。Coud you please respective implementation it in：channel-shared，channel-wise，element-wise forms,I found it difficult to implementation.thank you!

rahul_kanojia · September 5, 2019, 8:19am

F stands for torch.nn.functional

shalom_p · November 6, 2023, 5:25am

@yao-ying Going by your comment I think the implementation would be something like this.

import torch.nn as nn
class learnableSwish(nn.Module):
    def __init__(self):
        super(learnableSwish, self).__init__()
        self.beta = nn.Parameter(torch.as_tensor(0))

    def forward(self,x):
        x = x*nn.functional.sigmoid(self.beta*x)
        return x

please let me know if you face some issues.