Cuda error GPU error, unable to allocate memory


I have built the semantic segmentation model and trying to deploy my model in a 16GB GPU machine with a 4vCPU machine.

when I try to set the workers as 4 I get a Cuda Memory error. After that, I try to reduce the number of workers to 2. It worked and I was able to deploy and host the model as an API.

can anyone tell me how to completely utilise 4 works to make 4 concurrent API calls?

@ptrblck #deployment #vision

The number of workers in a DataLoader should not cause “CUDA Memory errors”, so could you explain which error you saw and post a minimal, executable code snippet to reproduce the issue?

thanks for your reply @ptrblck , here I’m doing real-time inferencing. So, I will be passing only one image at a time. so I believe the number of workers in the DataLoader during inference is 1.

please correct me if I’m wrong.

I’m not sure how a DataLoader would work with a real-time inference use case, but in any case no “Memory errors” should be raised, so I would still need more information about the actual error and a minimal code snippet to be able to debug it.