Defining Custom leaky_relu functions

111179 · March 9, 2020, 11:53am

I try to defining custom leaky_relu function base on autograd, but the code shows “function MyReLUBackward returned an incorrect number of gradients (expected 2, got 1)”, can you give me some advice?
Thank you so much for your help.
the code as shown:

import torch
from torch.autograd import Variable
import math
class MyReLU(torch.autograd.Function):
@staticmethod
def forward(ctx, input,negative_slope):
output = input.clamp(min=0)+input.clamp(max=0)*negative_slope
ctx.save_for_backward(input)
return output
@staticmethod
def backward(ctx, grad_output):
input, = ctx.saved_tensors
negative_slope, = ctx.saved_tensors
grad_input = grad_output.clone()
return grad_input
dtype = torch.float

N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
relu = MyReLU.apply
y_pred = relu(x.mm(w1),0.01).mm(w2)
loss = (y_pred - y).pow(2).sum()
if t % 100 == 99:
print(t, loss.item())
loss.backward()
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
w1.grad.zero_()
w2.grad.zero_()

DAlolicorn · March 9, 2020, 1:25pm

Hi, the problem is that the backward function expect you to return all gradients for each input argument of forward function.
Since you don’t need gradient for the argument negative_slope, just return None:

@staticmethod
def forward(ctx, input,negative_slope):
    output = input.clamp(min=0)+input.clamp(max=0)*negative_slope
    ctx.save_for_backward(input)
    ctx.slope = negative_slope
    return output
@staticmethod
def backward(ctx, grad_output):
    input, = ctx.saved_tensors
    slope = ctx.slope
    grad_input = grad_output.clone()
    grad_input = grad_input * (input > 0).float() + grad_input * (input < 0).float() * slope
    return grad_input, None

By the way corrected the backward and forward function without testing, hope it works.

111179 · March 10, 2020, 12:59am

yep! Thank you !!!

Chinmay_Rane · February 27, 2021, 4:44pm

Hi thank you the custom leaky. I was wondering. if i trained the negative slope value here, where please correct me if i am wrong. I need to pass the gradient required for the slope in backward propagation as i did below after calculating the gradient for slope

@staticmethod
def forward(ctx, input,negative_slope):
output = input.clamp(min=0)+input.clamp(max=0)*negative_slope
ctx.save_for_backward(input)
ctx.slope = negative_slope
return output
@staticmethod
def backward(ctx, grad_output):
input, = ctx.saved_tensors
slope = ctx.slope
grad_input = grad_output.clone()
grad_input = grad_input * (input > 0).float() + grad_input * (input < 0).float() * slope

return grad_input, negative_slope_gradient

how can i get the updated negative slope after training and also, how can i check if it is training simultaneously. I, also understand we need to do negative_slope.requires_grad = True. Any help is appreciated. Thank you

111179 · March 3, 2021, 5:29am

Hi, the negative_slope is a constant as input not learnable parameters. If you want to implement a ReLU function with learnable parameters, you can refer to PReLU, for example: https://medium.com/@shoray.goel/prelu-activation-e294bb21fefa

Hope it can help you!

Chinmay_Rane · March 9, 2021, 4:24pm

Hi thanks for your response. the medium post was helpful. I designed a trainable parameter. .

I have question on the above leaky relu. have you tried running it on cuda? On CPU the forward and backward both work but when you shift device to cuda the code doesn’t access the backward function, it only access the forward pass. I have also created a new question in the hopes to get the answer here Custom Backward function using Function from torch.autograd fails on cuda but works on cpu

Please let me know if it works for you. It definitely doesn’t work for me. I even tried the make_dirty → ctx.mark_dirty(output). I get the following error ->RuntimeError: a leaf Variable that requires grad has been used in an in-place operation. then i also tried the ‘output.view_as(output)’ but i got ‘RuntimeError: Some elements marked as dirty during the forward method were not returned as output. The inputs that are modified inplace must all be outputs of the Function.’

111179 · March 12, 2021, 6:28am

Sorry, I haven’t try it on GPU. But, I implement another custom activation function on GPU, for instance:
class Surrogate_BP_Function(torch.autograd.Function):

    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        out = torch.zeros_like(input).cuda()
        out[input > 0] = 1.0
        return out

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad = grad_input * 0.3 * F.threshold(1.0 - torch.abs(input), 0, 0)
        return grad

It works on GPU.
I hope it can help you.

Chinmay_Rane · March 13, 2021, 2:49am

111179:

@staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        out = torch.zeros_like(input).cuda()
        out[input > 0] = 1.0
        return out

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad = grad_input * 0.3 * F.threshold(1.0 - torch.abs(input), 0, 0)
        return grad

hey thanks for helping but i checked and it doesn’t seem to use your custom backward pass. it uses pytorches default i believe.

way to check- try using breakpoints maybe on spyder or jupyter. insert break points inside forward and backward.

2nd compare your loss values for both cuda and cpu.