Computationally Heavy Computer Vision Algorithms on a Remote Server

I have a general question regarding running computationally heavy computer vision algorithms on remote servers. So far I have been working with smaller datasets which takes only a few hours to run and yet I have faced many situations where the process broke down due to a multitude of reasons. Maybe it was my internet connection or my VSCode got disconnected, maybe it was due to the VPN. I wonder how people set their computational environment so they can run a process without trouble for days? other than checkpoints, which methods are common practice? I would be glad to hear about your opinion

Hi Samin!

Let me assume that your remote server is running a flavor of unix (e.g.,
linux) and that you can run on it a remote shell (e.g., ssh) from your local
computer.

You want to run the remote job so that it is not dependent on the connection
between your local computer and your remote server.

In the unix world, you run your desired remote command using “nohup”
(which comes from “no hang-up”):

nohup python my_remote_job_python_script.py &> my_remote_job_output.txt

You can then watch the progress of your remote job by running (in a remote
shell):

tail -f my_remote_job_output.txt

If your connection goes down, you just restart your connection, start a
new remote shell, and go back to watching your job by running tail -f
again. (If your remote python script crashes or your remote server blows
up, you’re in the soup, but, hey, at least your local computer can get struck
by lightning …)

Best.

K. Frank