Custom Relu function slower than normal?

I am trying to make a Custom Relu function that just doesn’t apply the relu to the gradient. The code is as follows:

class reluForward(torch.autograd.Function):
def forward(self, inp):
#option1:
#return = inp * (inp>0).float()
#option2:
#return F.relu(inp).data
#option3:
return F.relu(inp)
def backward(self, grad_out):
return grad_out

Option 1 and 2 both perform the correct function but are significantly slower than the built in nn.relu function (actually starting ok and then getting slower and slower as it runs) while option 3 tells me “data must be a Tensor.” Looking for any ideas for the slowdown or a different way to do this!

Edit: I have also now tried with return inp.clamp(min=0) which also works with the slowdown.

Got it! Don’t know why this would cause a problem, but I was declaring self.reluForwarder = reluForward() in the Module’s init. Moving it to just be reluForwarder = reluForward() in the forward function seems to make it work at the same speed as regular relu.

To avoid this mistake, you should keep in mind that a Function is not an nn.Module and should be used only once.
Also, for better performance, you should use the new style functions as follow:

class reluForward(torch.autograd.Function):
  @staticmethod
  def forward(self, inp):
    #option1:
    #return = inp * (inp>0).float()
    #option2:
    #return F.relu(inp).data
    #option3:
    return F.relu(inp)

  @staticmethod
  def backward(self, grad_out):
    return grad_out

# To use it:
inp = Variable(torch.rand(10, 10))
out = reluForward.apply(inp) # Use the class here, not an instance of it !