How to pass python object to custom c++ operators

I follow the tutorial to add a custom c++ operator. In the middle of the operator I want to allreduce some tensor in all process. At beginning of the python code, I use the dist.init_process_group before doing torch.nn.parallel.DistributedDataParallel training and other communication. So how can I pass the process group related information to my custom c++ operator? Only After getting this information, I can call the nccl function with the right commnicatior and its corresponding stream.