Efficiently applying per neuron activation functions

ParticularlyPythonic · August 20, 2020, 11:33pm

I want to use a custom activation function that has a random component that gets applied to every neuron individually.

If I use the standard method and call the activation function on a layer, it applies the same value to every neuron in that layer.

I am looking for the most efficient way to have the activation function affect every neuron individually and would appreciate any advise on the topic.

The only method I currently know about involves splitting the input tensor and applying the function each part and then concatenating the result, but I am hoping there is a more efficient way

albanD · August 20, 2020, 11:42pm

Hi,

That will depend a lot on what your function is. Can you be more precise what it is? Or provide a code sample that shows the split/concat version?

ParticularlyPythonic · August 20, 2020, 11:55pm

The function is going to be a modified mish with a random coefficient in the exponential.

The split concat version I was talking about was referencing this forum post: How to apply different activation fuctions to different neurons

albanD · August 21, 2020, 12:01am

If you want a modified mish, then I would implement the function using just regular element-wise functions. And then you create a Tensor with the coefficient you want and can add/multiply it at any place your need in the computation.

ParticularlyPythonic · August 21, 2020, 12:05am

I am sorry, I should have been more clear about the activation function

x *( torch.tanh(F.softplus(x, beta = random.uniform(self.beta_lower, self.beta_higher))))

This is the function I want to use. And I want to know the best way to get a different beta for each neuron.

Thanks for your patient response!

albanD · August 21, 2020, 12:14am

Hi,

I’m afraid the softplus function only accepts a single beta value. But you can just re-implement it

batch_size = x.size(0)
# Generate one beta per sample
beta = torch.empty(batch_size).uniform_(self.beta_lower, self.beta_higher)
softplus = ((beta * x).exp() + 1).log() / beta
res = x * torch.tanh(softplus)

Would that match your needs?

ParticularlyPythonic · August 21, 2020, 12:23am

I think it just might!

Thanks a lot!

I might be able to fix this fiddling around with the shapes, but I am currently getting this error:

The size of tensor a (64) must match the size of tensor b (500) at non-singleton dimension 1

albanD · August 21, 2020, 12:33am

Where does it happen?
Note that I assume in the code above the x is 1D. You might need to update that.
For example, if x is 2D:

If you want beta to be the same for all elements for a given sample, you can simply add a beta = beta.unsqueeze(-1) to add a new dimension of size 1 in beta and the broadcasting logic will take care to expand it.
If you want a different beta for every element of every sample, then just update the size given to torch.empty() to reflect the full size of x.

ParticularlyPythonic · August 21, 2020, 12:44am

The unsqueezing fixed it.
Thanks a lot!

Andrews · September 4, 2020, 10:51am

Hi , Im also trying to apply a different activation function per neuron on a simple MLP. So can you please share how you managed to separate tensors from a linear layer to apply different activation Fs. I want my model to have seven nodes and apply different activation functions per node (first layer). I tried the select_index and the other technique mentioned but i cant do it. Im getting errors like this " ```
The size of tensor a (64) must match the size of tensor b (500) at non-singleton dimension 1

"
Maybe the problems sterms from the fact that i dont how a tensor representing the layer looks like and im simply indexing it in a similar way one would index a list. Also refer me to where i can find more info about this. Thanx . sorry for the long paragraph.