Pytorch distributed ephemeral ports communication after rendezvous

hi Shen Li/pytorch team,

 After the rendezvous,  is there a way to restrict the connection between some peers in the p2p alignment.   like for example if there are 3 peers, can i restrict communication between p1-p2 and p1-p3 exclusively.   as I see it the process automatically starts ephemeral ports over tcp to talk to all peers even though i programmatically restricted data communication between p2 and p3?  your response if greatly appreciated

Thanks for the question! Is this for RPC or for collective communication libraries (e.g. NCCL)? The rendezvous is the similar but the backend setup / communication will behave different depending on what you’re using.

i am using rpc. tensorpipe backend. can i have this scenario with rpc tensorpipe backend? without connections between client1, client2, client3… all connections should be only through central.

Yes, that is possible. You can init_rpc("central", rank=0, world_size=4) and the rest init_rpc("client1", rank=1,...), init_rpc("client2", rank=2,...), and so forth

The only caveat is that I believe during rpc.shutdown() each rank will do a 1:1 communication to validate that there are no unresolved messages.

sure that is the problem. in my scenario where all three clients are untrusted contractually and i cannot have any connection between client1, client2 or 3. The only connection that is possible is with central and over tls 1.3+ with cipher. I figured the security part. when i analyze tcpdumps, the tensorpipe is opening some random ephemaral ports over tcp and without allowing that, the connections wont happen throwing gloo socket connection error. Is there any other way where i can still execute pytorch rpc without connecting client1, client2 and client3