Empty_cache behavior for multiple GPUs

A few jobs of mine were permanently stalled at cuda.empty_cache. I have access to 2 GPUs (8 GB, P4s) per job. I was hoping someone could shed light (or a link) on how empty_cache works for a multi GPU operation.
I had come across a few people facing similar issues but can’t find that thread anymore.