Hi, let’s say I have the grid grid, a 3D representation, of size (size, size, size) and I’d like to apply some rotation, scaling and translation (R, S, T) to it (all 4x4 in homogenous coordinates, T = [Identity(4,3) | t], Identity(4,3) is and identity matrix of 4 rows and 3 columns and t a vector of size 4 with 1 in its last position).
The equivalent transformation is defined as
theta = torch.bmm(torch.bmm(T, S), R)
To generate the sampling positions I make
sample_grid = affine_grid(theta, (batch_size, num_channels, new_size, new_size, new_size))
But according to the implementation this does this:
Tensor base_grid = make_base_grid_5D(theta, N, C, D, H, W, align_corners);
auto grid = base_grid.view({N, D * H * W, 4}).bmm(theta.transpose(1, 2));
And from my understanding of Multiview Geometry (wihch is very scarce so there’s I high chance I’m wrong), this calculates where a point in the new grid would be mapped to in the old grid, by applying the transformation theta. But since what I want is to apply a rotation, scaling and translation to the original grid, I’d have to use the inverse of that theta
to achieve it.
Math explanation
# The final position (x', y', z') as applying a rotation R, scaling S and translation T to a point (x,y,z) can be computed as:
# \theta * (x, y, z) = T * S * R * (x, y ,z) = (x', y', z')
# Multiplying by the inverses
# (x, y, z) = R^-1 * S^-1 * T^-1 * (x', y', z')
This way I correctly apply the transformation to my original grid, and not the other way arround (the transform to the new grid)
Final question: Do I use theta
or theta^-1
to create the sampling grid?
Please let me know if I failed to explain clearly my doubt and if I’m wrong the reasons.
Thanks!