Does PyTorch handle bilinear sampling for Mask-RCNN

jsuit · October 4, 2017, 3:45am

I know there is torch.nn.functional.grid_sample, but I’m not sure that helps here.

ruotianluo · October 5, 2017, 12:35am

This is how I implement RoIAlign using affine_grid and grid_sample.

ruotianluo/pytorch-faster-rcnn/blob/master/lib/nets/network.py#L91




def _proposal_layer(self, rpn_cls_prob, rpn_bbox_pred):
  rois, rpn_scores = proposal_layer(\
                                  rpn_cls_prob, rpn_bbox_pred, self._im_info, self._mode,
                                   self._feat_stride, self._anchors, self._num_anchors)


  return rois, rpn_scores


def _roi_pool_layer(self, bottom, rois):
  return RoIPoolFunction(cfg.POOLING_SIZE, cfg.POOLING_SIZE, 1. / 16.)(bottom, rois)


def _crop_pool_layer(self, bottom, rois, max_pool=True):
  # implement it using stn
  # box to affine
  # input (x1,y1,x2,y2)
  """
  [  x2-x1             x1 + x2 - W + 1  ]
  [  -----      0      ---------------  ]
  [  W - 1                  W - 1       ]
  [                                     ]
  [           y2-y1    y1 + y2 - H + 1  ]

jsuit · October 5, 2017, 2:42pm

@ruotianluo
Thanks! I have a few questions. First, why are you dividing by 16 in x1 = rois[:, 1::4] / 16.0. Second, I’m conceptually confused by:

 theta[:, 0, 0] = (x2 - x1) / (width - 1)
 theta[:, 0 ,2] = (x1 + x2 - width + 1) / (width - 1)
 theta[:, 1, 1] = (y2 - y1) / (height - 1)
 theta[:, 1, 2] = (y1 + y2 - height + 1) / (height - 1)

What is this calculating? I assume it’s x,y, width and height? And if so, why do you pass it to F.affine_grid? These don’t seem to be angles which you would want to transform bottom? (what is bottom?)

Thanks!

ruotianluo · October 5, 2017, 3:05pm

The input coordinates are under original image scale, and bottom is the Conv feature before roiAlign, so the coordinates are changed to feature scale.

Theta is the transformation matrix. X1, y1 is the left corner and x2, y2 KS the right bottom corner of the roi.

If you don’t fully understand, you can set some value to x1 y1 x2 y3 bottom, and run the function.

jsuit · October 5, 2017, 7:22pm

@ruotianluo Again, thanks. But my confusion is two-fold: I don’t see why the values of the transformation matrix are what they are. Second, I don’t see why code

grid = F.affine_grid(theta, torch.Size((rois.size(0), 1, pre_pool_size, pre_pool_size)))
crops = F.grid_sample(bottom.expand(rois.size(0), bottom.size(1), bottom.size(2), bottom.size(3)), grid)

would give you what you wanted. You are sampling after applying an affine transformation so there is no reason to think that you are sampling even remotely close to any region of interest anymore.

I must be missing something obvious.

ruotianluo · October 6, 2017, 2:22am

Try this snippet

gist.github.com

https://gist.github.com/ruotianluo/043fae22e9f8fd1b36b82189f2356937

test_roialign.py

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

import numpy as np

This file has been truncated. show original

jsuit · October 21, 2017, 9:25pm

@ruotianluo Thanks, but it still leaves me confused as to how you are deriving theta. Why is theta defined the way it is? Maybe I’m missing something obvious, but I still don’t see why this implements ROIAlign.

ruotianluo · October 22, 2017, 3:20am

jsuit · October 22, 2017, 10:19pm

@ruotianluo
Yes, I understand Transformation Matrices and Affine Transformations. But that doesn’t explain where you got the values for theta. To give a concrete example of my confusion, why does theta [:,0,0] = (x2-x1)/(width - 1)? I understand that theta is the transformation matrix, but that doesn’t explain why theta, the transformation matrix, has the values you gave it. In other words, why did you choose the transformation matrix that you did?

ruotianluo · October 23, 2017, 2:05am

(x2-x1)/(width - 1) is the scaling term. After the transformation, the width is x2-x1 and before it’s width - 1. (The reason why it’s width - 1 is we treat each pixel on the grid corners.)

longcw · December 7, 2017, 1:25am

I ported crop_and_resize from tensorflow for pytorch.
F.grid_sample has to expand the input feature and cost a lot of memery if we have too many rois.

crops = F.grid_sample(bottom.expand(rois.size(0), bottom.size(1), bottom.size(2), bottom.size(3)), grid)

sal · January 4, 2019, 5:40pm

How can this be ported for quadrilateral rois defined by 8 coordinates not just horizontal bboxes ?

would it make sense to use cv2 perspective transform to calculate theta as input to affine_grid ?