Spatial Transformer Networks - Why Interpolation/Sampling?

Hey,
After transformation is computed and the sampling grid (transformed input grid) is generated, the sampling process (assigning values to grid) starts. Is the following explanation for the sampling process correct?: The sampling grid is no longer integer, but usually decimal (float). Because the output grid is supposed to be integer again, we interpolate each integer value in the output using the sampling grid (which has a direct association with the input grid and therefore values) and its neighbors in the input grid?