Experiencing SELU Woes

Have any of you successfully used SELU with either a residual or skip connection network architecture? If so, what are your techniques for setting up and implementing it / what changes to the architecture did you have to make to get it working and converging without exploding the gradients?

1 Like

Same question here. I was able to get similar performance to batchnorm with SELU but not better.

Isn’t similar performance a win since SELU is faster?

Yes, SELU definitely runs faster. I would like to learn the use case where it substantially improved the score of the experiment.