I know the init process takes more time on machines with more GPUs, but since we are calling the Pytorch script from external scripts it is kind of a bottleneck for our process.
No, you won’t be able to cache anything as the CUDA context creation is taking some of the init time and is loading the driver, the native PyTorch kernels, the CUDA math lib kernels (cublas, cuDNN etc.) and needs time to load the actual data onto the device.
In case you are using CUDA 11.7+, you can activate lazy module loading via CUDA_MODULE_LOADING=LAZY , which will avoid pre-loading every kernel and will load it lazily once it’s needed. This will speed up the init process but will add a small overhead for each new kernel as it has to be loaded into the CUDA context before its first execution.
You would need to build PyTorch with CUDA 11.7 (or install the nightly binaries) to see the effect. Installing the CUDA toolkit 11.7 on your system with a PyTorch binary using another CUDA runtime (e.g. 11.6) will not work. You should also see a reduction in memory of the CUDA context size and could use it to double check if lazy loading is working.
I don’t fully understand the second point, so could you explain which scripts are supposed to stay in memory?
Thank you for the suggestions. It makes sense, I will try it.
Currently we have a web server which loads PHP scripts and then they execute the python (PyTorch) scripts using a shell type interface. But this creates a bottleneck because each reload takes too much time. I.e. the actual work is done in 10s but load time is another 10s.
So it will be probably much more efficient to load the python and leave it in memory where it will take requests, do the compute and return the results.
Ah yes, thanks for the explanation.
You are right and I would also suggest to try to initialize the Python/PyTorch process once and reuse it later with new requests. I’m not sure which web server application you are using, but would it be possible to write a startup and teardown method which would keep the PyTorch process alive?