We have built an inference pipelines that take advantage of multiple GPU cores . We pass say a single image to inference.py using a shell script and it will return some results back. we have a shell script that contains the following:
the code loads separate models into each GPU cores and runs them and then exits. Now we want to convert this script into a flask API. This way, users can pass images to the inference.py from a front end web UI using a post command. So we made the following changes to the inference.py
@app.route("/URL", methods=["POST"])
def search_engine():
if request.method == "POST":
result = run_multi_GPU_code(request)
return jsonify(result)
if __name__ == "__main__":
dist.init_process_group(backend="nccl", init_method="env://")
app.run(port=8115, host="0.0.0.0", debug=True)
and i get
File "inference.py", line 55, in <module>
dist.init_process_group(backend="nccl", init_method="env://")
File "/usr/local/lib64/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 500, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/usr/local/lib64/python3.8/site-packages/torch/distributed/rendezvous.py", line 190, in _env_rendezvous_handler
store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout)
RuntimeError: Address already in use
This typically means the port (I guess 8115) is already in use by a different process. You can find the process using that port by running this command and looking for a line with LISTEN:
Thank you for your response, i have done this already, changing the port and looking if torch distributed uses 8115, and it does not and it is not seem to be related to the port. I feel it is because FLASK Api launches a server and torch distributed also launch a server and the combination of the two does not work well. Another question is that i want to run inference in a distributed fashion. That is loading several instance of a model and passing a segment of the data to them (propagate or map the models and data), then i have to collect the model predictions from all models (reduce). Does pytorch have any tool or api to support this inference pipeline ?
The main reason is that when using torch.distributed.lauch to run the model parallel on 2 devices, python generates two processes for each device, and each process runs all the lines in the script.
This would be an issue when it comes to app.run(port=8115), where all the processes would try to take over one same port to launch their own severs.
Imagine process 0 launches app.run on port 8115 first, and successfully.
Process 1 tries to use port 8115 as well, but the port is already taken by process 0.
That’s where the RuntimeError: Address already in use comes from.
I came into this issue as well, I know where it comes from but I don’t know how to solve this problem.