Stop inference on GPU

Is there a way to stop the process of the inference on the GPU?

Description: I load a object detection model onto the GPU. After that I want to run the inferences on the GPU. Is there a possibility to stop the inference process on the GPU (for example after a specified time)? But the model should remain on the GPU and a reload of the model onto the GPU should be prevented.

CUDA is asynchronous so I’m not sure if there’s a way to stop an execution from the host.

Could you elaborate on your use case, are you trying to vary batch size based on throughput while a model is running? I generally profile offline and then keep the batch size fixed to a performance threshold I know.