Hi all,
I am currently working on deploying a uvicorn model inference API to kubernetes, and I am running into a peculiar problem. According to torch.cuda.is_available()
, my CUDA is ready to be used (it returns True), however as soon as I try to inference the model, I get an internal server error telling me that there is an illegal memory access.
For context, I am trying to find license plates on grayscale images.
I cannot seem to get CUDA_LAUNCH_BLOCKING=1 working on the container, no matter which way I put it into the environment variables or the os.environ, so I am only getting the “illegal memory access” as an error.
Here are a few things I have already tried:
- Different CUDA versions (11.3 and 11.6 specifically)
- Garbage collection and clearing the cache
I have also tried transforming the numpy array to a tensor with the torch.from_numpy
function, but that gave me a ValueError: not enough values to unpack (expected 4, got 2)
error.
This is on a kubernetes container where I have deployed a docker container from a private container registry. Here is the output for the torch.utils.collect_env
function:
PyTorch version: 1.12.1+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.27.2
Libc version: glibc-2.31
Python version: 3.10.12 (main, Jul 5 2023, 18:54:27) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-1112-azure-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla K80
Nvidia driver version: 470.82.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.25.2
[pip3] torch==1.12.1+cu116
[pip3] torchaudio==0.12.1+cu116
[pip3] torchvision==0.13.1+cu116
[conda] numpy 1.25.2 pypi_0 pypi
[conda] torch 1.12.1+cu116 pypi_0 pypi
[conda] torchaudio 0.12.1+cu116 pypi_0 pypi
[conda] torchvision 0.13.1+cu116 pypi_0 pypi
Here is a minimalistic version of the main script, which throws the same error:
import torch
import base64
import cv2
import numpy as np
model = torch.hub.load('ultralytics/yolov5', 'custom', path='best_1024.pt', force_reload=False)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)
if torch.cuda.is_available():
torch.cuda.empty_cache()
def find_plate(image_content):
img_equalized = read_request_image(image_content)
img_tensor = torch.from_numpy(img_equalized).float().to(device) # throws ValueError if used as input for the model, expected 4 values got 2
results = model(img_equalized, size=1024)
return {
'results': str(results),
'device': device,
'tensor': img_tensor.device
}
def read_request_image(image_content):
im_bytes = base64.b64decode(image_content)
im_arr = np.frombuffer(im_bytes, dtype=np.uint8) # im_arr is one-dim Numpy array
img = cv2.imdecode(im_arr, flags=cv2.IMREAD_GRAYSCALE)
return img
if __name__ == '__main__':
plate_base64 = '<base64 of image containing license plate>'
result = find_plate(plate_base64)
print(result)
If I turn off the GPU for the container, the main script works without issues but it logically flows through the CPU and is thus much slower. What can I do to resolve this issue?
Many thanks in advance, this is the last thing holding up our deployment so help would be very much appreciated.