Torch distributed launch & Flask Api

We have built an inference pipelines that take advantage of multiple GPU cores . We pass say a single image to using a shell script and it will return some results back. we have a shell script that contains the following:

CUDA_VISIBLE_DEVICES=1,2 python3 -m torch.distributed.launch --master_port 9800 --nproc_per_node=2

in the we have

dist.init_process_group(backend="nccl", init_method="env://")

the code loads separate models into each GPU cores and runs them and then exits. Now we want to convert this script into a flask API. This way, users can pass images to the from a front end web UI using a post command. So we made the following changes to the

@app.route("/URL", methods=["POST"])
def search_engine():
    if request.method == "POST": 
        result = run_multi_GPU_code(request)
        return jsonify(result)

if __name__ == "__main__":

    dist.init_process_group(backend="nccl", init_method="env://"), host="", debug=True)

and i get

  File "", line 55, in <module>
    dist.init_process_group(backend="nccl", init_method="env://")
  File "/usr/local/lib64/python3.8/site-packages/torch/distributed/", line 500, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/usr/local/lib64/python3.8/site-packages/torch/distributed/", line 190, in _env_rendezvous_handler
    store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Address already in use

Any help is appreciated.

This typically means the port (I guess 8115) is already in use by a different process. You can find the process using that port by running this command and looking for a line with LISTEN:

netstate -anp | grep 8115

Thank you for your response, i have done this already, changing the port and looking if torch distributed uses 8115, and it does not and it is not seem to be related to the port. I feel it is because FLASK Api launches a server and torch distributed also launch a server and the combination of the two does not work well. Another question is that i want to run inference in a distributed fashion. That is loading several instance of a model and passing a segment of the data to them (propagate or map the models and data), then i have to collect the model predictions from all models (reduce). Does pytorch have any tool or api to support this inference pipeline ?