Hello everyone,
recently I created a script that uses maskRCNN net to do instance segmentation over and over again.
I do the setup as this:
device = None
model = None
def init_maskRCNN():
global device
global model
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights=MaskRCNN_ResNet50_FPN_Weights.DEFAULT).to(device)
model.eval()
which turns OK, device is set to '‘cuda:0’ (RTX3060, 12GB VRAM). Then I do repeatedly call a function, which gets inference, that goes (simplified) as:
def inference_maskRCNN(path):
img = Image.open(path)
trans = T.Compose([T.ToTensor()])
img = trans(img)
img = img.to(device)
prediction = model([img])
if (prediction[0]['scores'][0].size == 0) or (prediction[0]['scores'][0] < THRESHOLD):
del img
del prediction
return [], [], []
prediction_score = list(prediction[0]['scores'].detach().cpu().numpy())
pred_t = [prediction_score.index(x) for x in prediction_score if x>THRESHOLD][-1]
if len(prediction[0]['masks']) != 1:
masks = (prediction[0]['masks']>0.5).squeeze().detach().cpu().numpy()
else:
masks = (prediction[0]['masks']>0.5).detach().cpu().numpy()
masks = masks[0, :, :, :]
prediction_class=[COCO_INSTANCE_CATEGORY_NAMES[i] for i in list(prediction[0]['labels'].cpu().numpy())]
pred_boxes = [[(i[0], i[1]), (i[2], i[3])] for i in list(prediction[0]['boxes'].detach().cpu().numpy())]
masks = masks[:pred_t+1]
prediction_class = prediction_class[:pred_t+1]
pred_boxes = pred_boxes[:pred_t+1]
del img
del prediction
torch.cuda.empty_cache()
return masks, prediction_class, pred_boxes
using this i can get several hundreds of inferences, during which, when i run nvidia-smi, i get:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:01:00.0 On | N/A |
| 30% 48C P2 42W / 170W | 1949MiB / 12288MiB | 1% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3093 G /usr/lib/xorg/Xorg 187MiB |
| 0 N/A N/A 3298 G /usr/bin/gnome-shell 46MiB |
| 0 N/A N/A 4686 G …3/usr/lib/firefox/firefox 175MiB |
| 0 N/A N/A 25349 C python 1535MiB |
±----------------------------------------------------------------------------+
(sorry about the whitespaces, couldnt get them right).
After some time, however, i got:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 6.68 GiB (GPU 0; 11.77 GiB total capacity; 7.90 GiB already allocated; 2.03 GiB free; 8.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
anyway. Does anyone have an idea on what am I doing wrong? I ve read that only del may not be enough, hence I do the empty_cache() calling, yet it does not help either.
Thank you,
Adam