Torchvision.ops.roi_align cruch session in colab

I’m trying to use ops.roi_align in my faster r-cnn. When I run training loop all goes fine in first epoch, but when the second one starts and training reaches roi align colab crashes due to lack of RAM.

Ones it raise me this error but I cant get it again, cause colab crashes first:

→ 206 roi_out = ops.roi_align(feature_map, proposals_list, self.roi_size)
207 print(“CM stop point 1”)
208 print(roi_out)

/usr/local/lib/python3.10/dist-packages/torchvision/ops/roi_align.py in roi_align(input, boxes, output_size, spatial_scale, sampling_ratio, aligned)
61 if not isinstance(rois, torch.Tensor):
62 rois = convert_boxes_to_roi_format(rois)
—> 63 return torch.ops.torchvision.roi_align(
64 input, rois, spatial_scale, output_size[0], output_size[1], sampling_ratio, aligned
65 )

/usr/local/lib/python3.10/dist-packages/torch/_ops.py in call(self, *args, **kwargs)
500 # We save the function ptr as the op attribute on
501 # OpOverloadPacket to access it here.
→ 502 return self._op(*args, **kwargs or {})
503
504 # TODO: use this to make a dir

ValueError: cannot create std::vector larger than max_size()

When I run the same code in Jupyter lab nothing crashes and no error appears. But instead, the execution of the program stucks at this step and hangs on it indefinitely.

It turned out quite a lot of code, so I don’t know what exactly needs to be attached to the problem description :frowning:

Edit 1: I forgot to mention that earlier use roi pool. Faster r-cnn train without any problem but trained network return output bboxes the size of one pew image was always larger than that of the image.