Hi! I want to do inference of a trained model with GPU. A CUDA asynchronous error is triggered if I feed the image one by one. However, if I input all the images simultaneously (set batch_size as the number of images), the inference works well. Does anyone know the reason? Thank you!
Here is my codes
class myDataset(Dataset):
def __init__(self, images, transform):
self.images = images
self.transform = transform
def __len__(self):
return len(self.images)
def __getitem__(self, idx):
image = cv.resize(self.images[idx], (256, 256))
image = self.transform(image)
return image
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
If I set batch_size = 1:
imgLoader = DataLoader(myDataset(images, transform), shuffle=False, batch_size=1)
masks = []
for img in imgLoader:
img_cuda = img.to(device)
print('---------\n')
masks.append(unet(img_cuda))
print('---------\n')
This outputs error:
---------
---------
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_2612/2454743386.py in <module>
1 masks = []
2 for img in imgLoader:
----> 3 img_cuda = img.to(device)
4 print('---------\n')
5 masks.append(unet(img_cuda))
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
If I set batch_size = 21, which is the total number of my images
imgLoader = DataLoader(myDataset(images, transform), shuffle=False, batch_size=21)
#masks = []
for img in imgLoader:
img_cuda = img.to(device)
print('---------\n')
masks = unet(img_cuda)
print('---------\n')
Everything works fine. Output is:
---------
---------
Here are the version of pytorch and GPU driver:
torch.__version__
>> '1.12.0'
torch.version.cuda
>> '11.3'
!nvidia-smi
>>
Sun Jul 3 19:52:09 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 73C P0 73W / 149W | 11275MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2666 C ...vs/pytorch_env/bin/python 11270MiB |
+-----------------------------------------------------------------------------+