How to determine maximum batch size for a networks to not get OOM?

Im going to implement pipeline of few networks functioning in a distributed environment. It need to work on different type of GPUs(amount of memory is a key).
The problem I do not know how to determine maximum batch size for those networks so I wont get OOM when starting inference. I think Im not first one to try to deploy networks in production on different devices so may be there is some accepted way of calculating batch size or there is only way of trial and error?

Even if you carefully calculate model size, activations, batches, etc and make sure everything fits into GPU memory there is a chance that few extra bytes can produce OOM, so experimentation is still the best option