I created a custom activation function MyReLU

howevre when I use it in the two layer models I get the error

` MyReLU.apply is not a Module subclass`

MyReLU is a subclass of torch.autograd.Function

I created a custom activation function MyReLU

howevre when I use it in the two layer models I get the error

` MyReLU.apply is not a Module subclass`

MyReLU is a subclass of torch.autograd.Function

You don’t use Function in places where Module is used, i.e. in `__init__`

of main module. You just invoke MyReLU.apply in forward(). If you want to use Function is containers like nn.Sequential, you must wrap it in a Module.

1 Like

When I use MyReLU.apply in forward of the method in a module, it does not work either:

```
class MyReLU(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
return input.clamp(min=0)
@staticmethod
def backward(ctx, grad_output):
grad_input = grad_output.clone()
return grad_input
class weightedReLU(nn.Module):
def __init__(self, weights = 1):
super().__init__()
self.weights = weights* nn.Parameter(torch.ones(1))
self.weights = Variable(self.weights.data, requires_grad=True)
def forward(self, input):
ex = self.weights.cuda()*MyReLU.apply(input)
return ex
```

As Alex mentioned they are not working in containers like nn.sequential.

IDK, your module works for me. If you use JIT, use additional wrapper:

```
@jit.ignore
def MyReLU(x):
return MyReLU.apply(x)
```

2 Likes

Does it also work when you wrap the model insider nn.sequential?

what’s the difference between defining a class as autograd function of as nn.Module?

Yea, I think your error comes from some other place…

```
class M(nn.Module):
def __init__(self):
super(M, self).__init__()
self.s = nn.Sequential(nn.ELU(), weightedReLU())
def forward(self,x):
return self.s(x)
m = M()
y=m.forward(torch.ones(5).requires_grad_())
y.sum().backward()
```

(your code untouched, except I removed cuda())

Function is not related to nn.Module, you don’t even create instances of it explicitly, it is just a way to provide backward(). Module base class provides has all usual facilities for submodules, parameters, module tree enumeration etc.

1 Like

Thanks for your answer.

Suppose I need to include a learnable parameter in function autograd,

similar to the function below

```
class CustAct(torch.nn.Module):
def __init__(self, alphas=1):
super(CustAct, self).__init__()
self.alphas = alphas*torch.nn.Parameter(torch.ones(1))
self.alphas.requiresGrad = True
def forward(self, x):
val = self.alphas.cuda() * x
return val
```

How I can include this in autograd function?

Is there a way that I can remove cuda() so it can automatically differentiate between cpu and cuda? Even when I use CustAct.cuda(), still I need to explicitly map self.alphas to GPU.

```
class CustAct(torch.nn.Module):
def __init__(self, alphas=1):
super(CustAct, self).__init__()
self.alphas = torch.nn.Parameter(torch.full((1,), alphas))
self.register_buffer("scale", torch.tensor([alphas]) #if you want to separate non-trainable constant, multiply in forward()
self.scale = alphas #works too
def forward(self, x):
val = customAutogradFunction(x, self.alphas) #if you must provide custom gradients
val = x * self.alphas #no need for autograd.Function
val = x.clamp(min=0) #your MyReLU as is also needs no autograd.Function
return val
```

that’s because of the way you used multiplication, your self.alphas became Tensor not nn.Parameter and was not moved. Another idiom is tensorA2=tensorA.to(tensorB), this changes device and dtype, but it is not usually needed.

1 Like

What is the effect of

in your code?

I am loading some checkpoints and with this method, the loader asks for the values of alphas. It there any work around for this problem?

How I could make a function in the way that it returns the gradients with respect to alphas, separately and not coupled with the other weights?

Is there a way that I can have separate alpha for each neuron? It seems that in this code, you used same self.alphas for all the elements of n dimensional array x.

that code was purely illustrative, register_buffer is just a way to store non-trainable tensors in a module (sometimes it is more conventient even for scalars).

Why you want to manually return gradients in the first place? You usually only do this with external (thus non-differentiable) calculations, e.g. c++ extensions, or if you have simplified/more efficient gradient formulas. In other cases you only write forward() for nn.Module and most operations on tensors know how to calculate their gradients.

For this code, does it mean that weights are non-trainable any longer, as they changed to tensor?

```
class weightedReLU(nn.Module):
def __init__(self, weights = 1):
super().__init__()
self.weights = weights* nn.Parameter(torch.ones(1))
self.weights = Variable(self.weights.data, requires_grad=True)
def forward(self, input):
ex = self.weights.cuda()*MyReLU.apply(input)
return ex
```

I don’t know what effects such a reassignment does, because that’s not an idiomatic pytorch code, to put it mildly. “Variable” is obsolete, and that line shouldn’t be there at all. For first assignment - yes, it may train, but your parameter won’t be registered.

I add the line for Variable, since otherwise it asked for

`retain_graph=True`

for backward.

It seems Variable will register the gradient of weights in the backward method.

This code and yours produce completely different results.

No, that’s probably error that you get if you do backward() twice.

Actually, weights* nn.Parameter(torch.ones(1)) won’t train as optimizer won’t find this parameter, so there is that. Oh, and then you may get that error. What a mess

My code with nn.ReLU works fine and with this custom activation after introducing Variable works fine as well (if I backward twice, it should be problematic with nn.ReLU too).

if you look at this post, it is suggested to do multiplication for the trainable variable.

This creates untracked parameter - it is not in module, but in backward graph. Optimizer doesn’t process it, so you get error on second iteration.

```
self.weights = Variable(self.weights.data, requires_grad=True)
```

this probably just makes first assignment have no effect, and is equivalent to

```
self.weights = nn.Parameter(torch.ones(1) * weights)
```

or torch.full((1,),weights)

```
list(LearnedSwish().parameters())
[]
```

good luck optimizing that

I tried this and print the weights. It seems that they are not training, and stayed constant. Even when I load the check points, it does not ask for the values of those parameters across the layers.

For this one

`self.weights = nn.Parameter(torch.ones(1) * weights)`

is there a way that I impose the loader to ignore asking for the values in the case of loading from a check point with conventional nn.ReLU activation?

not sure, there is nn.Module.load_state_dict(torch.load(PATH), strict=False), if you’re using training loop frameworks, check their docs…