Suppose I have a batch of images in some shape (B, H, W) on the GPU and I want to resize each image such that the result will be doubled in width (B, H, 2*W).
I tested a few approaches on a big batch of images with shape (3000, 224, 224) and got somewhat surprising results.
However, it looks like the default setting uses nearest-neighbor interpolation, which amounts to… copying data. When trying another mode such as “bilinear,” repeat-interleave is faster.
As for the initial speed difference between interleave and nearest-neighbor interpolation, while they might be very similar in high-level approach (no computation, just data movement), my guess is that their implementation strategies (how the work is coordinated across GPU blocks/threads, and memory access patterns) are sufficiently different to account for the performance difference.