Hi ,
I have a segmentation fault problem while inferring images for larger batch size on CPU.
Environment:
- CPU name: AMD EPYC 7763 64-Core Processor x 2
- Docker image: nvcr.io/nvidia/pytorch:20.10-py3
- torch: 1.7.0a0+7036e91
- torchvision: 0.8.0a0
Setting: - Number of works:64
- Batch size: [32, 64, 128, 256, 512]
Model: torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
Image: https://cocodataset.org/#explore?id=342322
To Reproduce simply:
import torch
import torchvision
import os
import cv2
device = torch.device('cpu')
batchsize = 258
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()
model.to(device)
img_path = os.path.join('test_one_img', 'test.jpg')
img = cv2.imread(img_path, -1)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = torchvision.transforms.functional.to_tensor(img)
img = list(img.to(device) for _ in range(batchsize))
output = model(img)
Result:
Batch size <128: work fine.
Batch size >256: segmentation fault
gdb debug infos:
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
ROIAlignForward<float> (nthreads=nthreads@entry=90617856, input=0x7f5d13b50040, spatial_scale=@0x7ffcd15a953c: 0.25,
channels=channels@entry=256, height=height@entry=200, width=width@entry=272, pooled_height=14, pooled_width=14, sampling_ratio=2,
aligned=false, rois=0x562a6c532300, output=0x7f6e507aa040) at /tmp/pip-req-build-6hs294b4/torchvision/csrc/cpu/ROIAlign_cpu.cpp:199
199 /tmp/pip-req-build-6hs294b4/torchvision/csrc/cpu/ROIAlign_cpu.cpp: No such file or directory.
Any help is appreciated.
Thanks!