P2P Cuda Aware MPI problem

seliad · December 22, 2019, 11:08am

Hello,
I’m trying to use Isends with cuda aware openmpi.

I found that I need to explicitly call torch.cuda.synchronize(device) before every Isend (otherwise training error collapses). I get that problem even when I stash the sent tensor (so it will have a reference and therefore won’t be freed and overwritten).

I have tried it with several different settings:

with P2P enabled GPU (GTX1080)
and without P2P enabled GPUs (RTX2080ti). (in the latter case the sends must go through the host).

I wonder, what could be happening there?
(I am using a single thread with async operations)