Fastest way upscale a batch of images

Vladik · February 14, 2022, 9:20pm

Suppose I have a batch of images in some shape (B, H, W) on the GPU and I want to resize each image such that the result will be doubled in width (B, H, 2*W).

I tested a few approaches on a big batch of images with shape (3000, 224, 224) and got somewhat surprising results.

First, I naively tried implementing it myself:

def upscale(images: torch.Tensor) -> torch.Tensor:
    B, H, W = images.shape
    arr = torch.empty((B, H, W*2), device=images.device, dtype=images.dtype)
    arr[..., ::2] = images
    arr[..., 1::2] = images
    return arr

upscale_jit = torch.jit.script(upscale)

This is roughly 13ms.

Then I tried torch.repeat_interleave(images, 2, -1) which was slightly quicker at 10ms.

Then I tried F.interpolate(images.unsqueeze(0), (H, W*2)) and got 5.5ms(!)

So two questions:

How the hell is interpolating faster than just straight up repeating\copying data?
Any ideas on what could do it even faster?

Using torch==1.8 or newer, these results are on my RTX3060 but this will eventually end up on a Xavier AGX if this makes any difference

eqy · February 14, 2022, 10:09pm

I’m seeing a similar ordering of speed on A6000 where “interpolate” by default is the fastest.

0.006442546844482422 (JIT)
0.0036177635192871094 (repeat interleave)
0.0027103424072265625 (nearest-neighbor interpolate)

However, it looks like the default setting uses nearest-neighbor interpolation, which amounts to… copying data. When trying another mode such as “bilinear,” repeat-interleave is faster.

0.006438493728637695 (JIT)
0.003609895706176758 (repeat interleave)
0.0041081905364990234 (bilinear interpolate)

As for the initial speed difference between interleave and nearest-neighbor interpolation, while they might be very similar in high-level approach (no computation, just data movement), my guess is that their implementation strategies (how the work is coordinated across GPU blocks/threads, and memory access patterns) are sufficiently different to account for the performance difference.

Vladik · February 15, 2022, 6:46am

Yeah I was suspecting something of this sort.

Still I wonder if there’s anything better