Hi,
When you mean it does not work, what does that mean? It crashes? Or it fails to train?
I am not an expert at all but I think in batchnorm at least, you actually want to backprop through the mean and std computation, not doing makes the performances much worst.