Howdy,
Applying RoIAlign
(torchvision.ops.RoIAlign) on rois
that is a tensor of shape (K, 5) results in a segfault : Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
.
The faulty behavior is not observed when :
- Applying
RoIAlign
onrois
that is a list of tensor of shape (4,) -
RoIAlign
input size is small enough (typically LEQ 32,10,10)
Here is a code snippet that should allow you to reproduce my observations.
import argparse
import torch
from torchvision.ops import RoIAlign
NB_ROI_PER_DOC = 5
if __name__ == "__main__":
torch.manual_seed(0)
parser = argparse.ArgumentParser(description='minimum reproducible example')
parser.add_argument('--bug', action='store_true', default=False, help='toggle on the issue')
parser.add_argument('--batch-size', type=int, default=1)
parser.add_argument('--input-size', type=str, default="64,10,10")
args = parser.parse_args()
align = RoIAlign((3, 3), spatial_scale=14 / 224, sampling_ratio=2)
batch_size = args.batch_size
input_ = torch.randn((batch_size, *(int(dim) for dim in args.input_size.split(","))))
rois = [torch.abs(torch.randn(NB_ROI_PER_DOC, 4)) for _ in input_]
if args.bug:
nb_rois = NB_ROI_PER_DOC * batch_size
boxes = torch.abs(torch.randn(nb_rois, 4))
roi_ids = torch.arange(nb_rois).view(-1, 1)
rois = torch.cat((roi_ids, boxes), dim=1)
output = align(input=input_, rois=rois)
print(rois)
print(output.shape)
Running the code in docker to be freeze packages versions and the like.
FROM pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime
RUN mkdir -p /opt/debug
WORKDIR /opt/debug
ADD mre.py .
ENTRYPOINT ["python", "/opt/debug/mre.py"]
Sample outputs :
Problematic behavior, to run : build -f Dockerfile.minimal -t debug-torch:v0 . ; docker run docker run debug-torch:v0 --bug; docker ps -a |grep debug-torch:v0
results with:
46b40fe6efee debug-torch:v0 "python /opt/debug/m…" 16 seconds ago Exited (139)
Expected behavior, to run : docker run docker run debug-torch:v0
results with:
[tensor([[1.4164, 0.2379, 0.9334, 1.1331],
[0.3530, 2.0928, 0.6356, 1.5069],
[0.9527, 1.0599, 0.9549, 1.3355],
[0.5251, 0.7416, 0.4269, 0.4008],
[0.7872, 0.0834, 1.1256, 1.5490]])]
torch.Size([5, 64, 3, 3])
To reduce the input size solves the problem, to run docker run docker run debug-torch:v0 --bug --input-size 5,5,5
results with:
tensor([[0.0000, 0.3584, 1.5616, 0.3546, 1.0811],
[1.0000, 0.8760, 0.2871, 1.0216, 0.5111],
[2.0000, 1.7137, 0.5101, 0.4749, 0.6334],
[3.0000, 1.2063, 0.6074, 0.5472, 1.1005],
[4.0000, 0.7201, 0.0119, 0.3398, 0.2635]])
torch.Size([5, 64, 3, 3])
I’m well aware that the most likely reason for the problem is me missing out something obvious in the using of RoIAlign
, feel free to simply let me know if this is the case.
Thank for reading down to this point & have a nice day