Model idle time on GPU

Initial situation: I have written an API to serve my object detection models. The model is loaded onto the GPU. Subsequently, the inferences are executed. However, if I have a break in between (let’s say 5 minutes), the first inference after the break then takes significantly longer.

My question: Are there certain parameters to prevent this? What is happening with the model on the GPU?

1 Like

Your GPU might go into an idle state and the wake up might take some time. You could enable the persistence demon as described in the docs:

This approach would prevent the kernel module from fully unloading software and hardware state when no user software was using the GPU.

2 Likes