ReLU vs LeakyReLU vs PReLU

What are the advantages and disadvantages of using each of them?

Is general formula of ReLU < LeakyReLU < PReLU correct?

ReLU will have the value to be zero when the input is below zero. This “flat line” zero will make gradient descent algorithm difficult, because the gradient of a “flat line” is zero. LeakReLU is introduced to resolve this problem, cause the output of a LeakReLU will be a nearly flat line, but not exactly flat.

From this paper they compared various activation functions and the results are quite varied. There doesn’t seem to be one obvious superior activation function. Because of that I don’t think that general formula is entirely accurate. If you also look at this lecture by Justin Johnson you can see a nice summarization graph of that paper I referenced and I recommend listening to the part of the lecture about activation functions to gain some deeper insights.

But to be brief, in case you don’t have the time to watch the lecture and to summarize: from my understanding there is no superior activation function. It varies which one is best on a case by case basis so people just stick with relu.