Is LayerNorm deterministic?

It isn’t for me. I want to confirm whether that is normal or not.

More specifically what I see happening is that in different builds I get different results for the same input.

I don’t think there is a guarantee to yield bitwise identical results between different PyTorch and library versions (e.g. different cublas, cuDNN etc.).
The results should be deterministic using the same setup (PyTorch + lib versions, GPU etc.) if you stick to the Reproducibility docs.

Sorry with builds I meant different training runs. The GPU could be different though.

I guess that explains it (different GPU).