Efficiently applying per neuron activation functions

I want to use a custom activation function that has a random component that gets applied to every neuron individually.

If I use the standard method and call the activation function on a layer, it applies the same value to every neuron in that layer.

I am looking for the most efficient way to have the activation function affect every neuron individually and would appreciate any advise on the topic.

The only method I currently know about involves splitting the input tensor and applying the function each part and then concatenating the result, but I am hoping there is a more efficient way

Hi,

That will depend a lot on what your function is. Can you be more precise what it is? Or provide a code sample that shows the split/concat version?

The function is going to be a modified mish with a random coefficient in the exponential.

The split concat version I was talking about was referencing this forum post: How to apply different activation fuctions to different neurons

If you want a modified mish, then I would implement the function using just regular element-wise functions. And then you create a Tensor with the coefficient you want and can add/multiply it at any place your need in the computation.

I am sorry, I should have been more clear about the activation function

x *( torch.tanh(F.softplus(x, beta = random.uniform(self.beta_lower, self.beta_higher))))

This is the function I want to use. And I want to know the best way to get a different beta for each neuron.

Thanks for your patient response!

Hi,

I’m afraid the softplus function only accepts a single beta value. But you can just re-implement it

batch_size = x.size(0)
# Generate one beta per sample
beta = torch.empty(batch_size).uniform_(self.beta_lower, self.beta_higher)
softplus = ((beta * x).exp() + 1).log() / beta
res = x * torch.tanh(softplus)

Would that match your needs?

I think it just might!

Thanks a lot!

I might be able to fix this fiddling around with the shapes, but I am currently getting this error:

The size of tensor a (64) must match the size of tensor b (500) at non-singleton dimension 1

Where does it happen?
Note that I assume in the code above the x is 1D. You might need to update that.
For example, if x is 2D:

  • If you want beta to be the same for all elements for a given sample, you can simply add a beta = beta.unsqueeze(-1) to add a new dimension of size 1 in beta and the broadcasting logic will take care to expand it.
  • If you want a different beta for every element of every sample, then just update the size given to torch.empty() to reflect the full size of x.

The unsqueezing fixed it.
Thanks a lot!

Hi , Im also trying to apply a different activation function per neuron on a simple MLP. So can you please share how you managed to separate tensors from a linear layer to apply different activation Fs. I want my model to have seven nodes and apply different activation functions per node (first layer). I tried the select_index and the other technique mentioned but i cant do it. Im getting errors like this " ```
The size of tensor a (64) must match the size of tensor b (500) at non-singleton dimension 1

"
Maybe the problems sterms from the fact that i dont how a tensor representing the layer looks like and im simply indexing it in a similar way one would index a list. Also refer me to where i can find more info about this. Thanx . sorry for the long paragraph.