How to customize activation function of torchvision model(resnet18)?

arslansadiq · January 21, 2019, 6:50pm

Hi all,
I hope that you are having a great day.
I am implementing a paper on uncertainty estimation and using torch-vision pre-trained model ResNet-18. However I want to use my own customize activation function in the second last layer of resnet-18 instead of relu. How do I do that?
I searched online but found no solution
Thank you.

ptrblck · January 22, 2019, 5:04am

By “second last layer” do you mean the second block in layer4?
If so, you could use the following code:

model = models.resnet18(pretrained=False)
model.layer4[1].relu = nn.LeakyReLU(inplace=True)  # your custom function here

Let me know, if you meant another layer.

devforfu · January 22, 2019, 7:26am

As a small addition to the answer, one can inspect the model by printing it:

print(model)
# which shows something like:
# ResNet(
#  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
#  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#  (relu): ReLU(inplace)
#  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
#  (layer1): Sequential(
#    (0): BasicBlock(
#      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
#      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#      (relu): ReLU(inplace)
#      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
#      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#    )
#    (1): BasicBlock(
#      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
#      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#      (relu): ReLU(inplace)
#      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
#      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#    )
#  )
# ... the rest of layers

And now you can easily navigate to whatever layer you would like, using attributes and arrays represented in the graph. It could be model.layer1[0].relu or model.layer2[1].conv1.weight, or any other part of the model.

arslansadiq · January 22, 2019, 3:20pm

Hi,
Thank you for your reply:
by second last layer I mean, layer before the first fully connected so it would be conv2 in BasicBlock1 of layer 4.

By the way I am using re-scaled cauchy distribution (kernel) (PDF) function 1/(1 + (x*x)), which I have implemented like this:

import torch

def pdf_cauchy_distribution(tensor):
    '''
    this fuction takes the output from neural netwrok's layer and implements a 
    kernal function which acts as an activation function.
    
    input:
    tensor: output of neural network's layer computation (w*x + b)
    
    output:
    also a tensor which after going to pdf cauchy distribution fucntion
    which is f(x) = 1/(1+x^2)
    '''
    
    return (1 / (1 + torch.mul(tensor, tensor)))

but when I use it like you described, it gives an error:

TypeError: cannot assign 'model.activation.pdf_cauchy_distribution' as child module 'relu' (torch.nn.Module or None expected)

Is there any functionality for linear activation functions such as the one I described offered by Pytorch?

I found this https://pytorch.org/docs/stable/distributions.html#cauchy, but I am not sure how does that serve my purpose?

ptrblck · January 22, 2019, 3:58pm

Could you try to define this activation as an nn.Module and try to assign it again?

arslansadiq · January 22, 2019, 4:33pm

Excuse my ignorance, do you mean like this?

class cauchy_activation(nn.Module):
    def __init__(self, x):
        super(cauchy_activation, self).__init__()
        self.inp = x
        
    def activation(self):
        return pdf_cauchy_distribution(self.inp)

def pdf_cauchy_distribution(tensor):
    '''
    this fuction takes the output from neural netwrok's layer and implements a 
    kernal function which acts as an activation function.
    
    input:
    tensor: output of neural network's layer computation (w*x + b)
    
    output:
    also a tensor which after going to pdf cauchy distribution fucntion
    which is f(x) = 1/(1+x^2)
    '''
    
    return (1 / (1 + torch.mul(tensor, tensor)))

if it is like this then:
when I instantiate this class, what should be the input, I mean what to feed to the class’ __init__()

arslansadiq · January 22, 2019, 4:57pm

I did it like this:

import torch
import torch.nn as nn

def pdf_cauchy_distribution(tensor):
    '''
    this fuction takes the output from neural netwrok's layer and implements a 
    kernal function which acts as an activation function.
    
    input:
    tensor: output of neural network's layer computation (w*x + b)
    
    output:
    also a tensor which after going to pdf cauchy distribution fucntion
    which is f(x) = 1/(1+x^2)
    '''
    
    return (1 / (1 + torch.mul(tensor, tensor)))

class cauchy_activation(nn.Module):
    def __init__(self):
        super(cauchy_activation, self).__init__()
        
    def activation(self, inp):
        return pdf_cauchy_distribution(self.inp)

and in model file:

class Resnet18(BaseModel):
    def __init__(self, classes=10):
        super(Resnet18, self).__init__()
        par = argparse.ArgumentParser(description='Model_resnet18')
        par.add_argument('-c', '--config', default = 'config.json', type=str, help = 'config file path (default: None)')
        args = par.parse_args()
        config = json.load(open(args.config))
        
        self.resnet = models.resnet18(pretrained = False)
        num_ftrs = self.resnet.fc.in_features
        self.resnet.fc = nn.Linear(num_ftrs, classes, bias=config['arch']['last_layer_bias'])
        self.resnet.layer4[1].relu = cauchy_activation.activation
        '''
        ct = 0
        for child in self.resnet.children():
            #print("child ", ct, ": \n", child)
            for param in child.parameters():
                if(ct != 9):
                    param.requires_grad = False 
                #elif(ct != 7):
                    #param.requires_grad = False
            ct += 1
        #exit
        '''
        #self.resnet.lay
    def forward(self, x_input):
        output = self.resnet(x_input)
        return output

still it is giving same error:

TypeError: cannot assign 'model.activation.cauchy_activation.activation' as child module 'relu' (torch.nn.Module or None expected)

ptrblck · January 22, 2019, 6:24pm

Could you try the following definition:

class cauchy_activation(nn.Module):
    def __init__(self):
        super(cauchy_activation, self).__init__()
        
    def forward(self, x):
        return pdf_cauchy_distribution(x)

model = models.resnet18(pretrained=False)
model.layer4[1].relu = cauchy_activation()
output = model(torch.randn(1, 3, 224, 224))

arslansadiq · January 22, 2019, 6:34pm

Wow, thank you sir it works now.
Can just explain in one or two sentences, why it works?

and is this activation function for the whole layer4? (seems to me it is). I wanted to just apply it to conv2 in BasicBlock1 of layer 4 .

ptrblck · January 22, 2019, 6:39pm

I just initialized a stateless module without assigning the input in its __init__.
Basically the module now just calls your activation function without storing any parameters.

Yes, it will be used twice. You would need to manipulate the BasicBlock implementation here.

arslansadiq · January 22, 2019, 6:43pm

alright thank you very much for your help.
God bless you.