We have built an inference pipelines that take advantage of multiple GPU cores . We pass say a single image to inference.py
using a shell script and it will return some results back. we have a shell script that contains the following:
CUDA_VISIBLE_DEVICES=1,2 python3 -m torch.distributed.launch --master_port 9800 --nproc_per_node=2 inference.py
in the inference.py
we have
dist.init_process_group(backend="nccl", init_method="env://")
the code loads separate models into each GPU cores and runs them and then exits. Now we want to convert this script into a flask API. This way, users can pass images to the inference.py from a front end web UI using a post command. So we made the following changes to the inference.py
@app.route("/URL", methods=["POST"])
def search_engine():
if request.method == "POST":
result = run_multi_GPU_code(request)
return jsonify(result)
if __name__ == "__main__":
dist.init_process_group(backend="nccl", init_method="env://")
app.run(port=8115, host="0.0.0.0", debug=True)
and i get
File "inference.py", line 55, in <module>
dist.init_process_group(backend="nccl", init_method="env://")
File "/usr/local/lib64/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 500, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/usr/local/lib64/python3.8/site-packages/torch/distributed/rendezvous.py", line 190, in _env_rendezvous_handler
store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
Any help is appreciated.