Crop_and_resize in PyTorch


Is there anything like tensorflow’s crop_and_resize in torch? I want to use interpolation instead of roi_pooling.

1 Like

Maybe you can use spatial transform layer.

I ported the crop_and_resize from tensorflow:
Hope this is useful for you.


That RoIAlign worked well. :slight_smile:

Seriously ???

This is just running C src underneath. And copied from multimodallearnings MRCNN repo. Atleast give original author the credits.

And do not claim to port to pytorch when it is not.

It was ported from tensorflow source code and I mentioned this everywhere in the readme and here.

I didn’t know what is the multimodallearnings MRCNN repo you mentioned.
And it worked well for pytorch < 0.4. I recommend to use facebookresearch/maskrcnn-benchmark since the cffi api was changed after pytorch 0.4 and 1.0.
My porting was finished in Dec, 2017 and the maskrcnn_benchmark was starting from Oct, 2018. You can post the repo if you still think I copied any code from there. Your accusation is the biggest insult to a programmer.

And at least, please clarify the facts before posting a comment.

OK, I found the repo multimodallearning/pytorch-mask-rcnn and actually they are using my code.
This was also mentioned in the readme

We use functions from two more repositories that need to be build with the right --arch option for cuda support. The two functions are Non-Maximum Suppression from ruotianluo’s pytorch-faster-rcnn repository and longcw’s RoiAlign.

So this is a misunderstanding, and I am very glad that my code is useful to others.


My bad I apologize. :sweat_smile:
I should have checked the dates.
Yup, sure the code helps.
Is there any implementation more in a “pytorch” way not using C src. ? Say using the functional.interpolate() layer ?

Or could you give an understanding of the geometric transform of how you calculate “y_in” and “x_in” in the loop per box ?

I want to use this for a randomly oriented quadrilateral roi’s, defined as a rotated bounding box, with 8 coordinates as input as opposed to a horizontal box defined by 4 coordinates.

1 Like

Check this for the roi pooling of pytorch way:

This repo used F.grid_sample at the first and changed to my crop_and_resize in this commit. Then the newest one used roi_align from facebookresearch/maskrcnn-benchmark

Hi, thanks for your great work.
I’ve got one problem. I’ve seen a version of ROI Align, one of whose parameters is the spatial_scale, representing the scale to map the feature coordinate to the original image. For example, if the original image is 224x224 and the feature map is 14x14, then the spatial_scale is 16.
In your version of ROI Align there is no such parameter. So I guess the ROIs being fed to this function should be scaled before hand, e.g., they should be divided by 16 in the example above. Am I right?

Hi, I am trying to understand the meaning of the spatial_scale parameter, but the documentation is not clear to me.
Reading the source code the reverse order makes more sense for me, e.g., 14/224 = 0.0625.

please refer to torchvision.transforms — Torchvision 0.11.0 documentation