Batch size scaling poorly with bigger input

Piotr_Paziewski · September 13, 2024, 12:50pm

Hello, I noticed that inference time is scaling poorly with bigger input size using resnet50 network:

input size: 224
batch size: 1: 6.78 (ms)
batch size: 4: 11.99 (ms)
batch size: 16: 40.49 (ms)

input size: 640
batch size: 1: 20.93 (ms)
batch size: 4: 84.83 (ms)
batch size: 16: 331.74 (ms)

Reproduction:

Code:

import time
from statistics import mean
import torch
import torchvision.models as models

if __name__ == "__main__":
    if not torch.cuda.is_available():
        print("cuda not available")
        exit()
    device = torch.device("cuda")

    model = models.resnet50()
    model = model.to(device)
    model.eval()

    input_sizes = [224,640]
    batch_sizes = [1, 4, 16]
    
    for input_size in input_sizes:
        print(f"\ninput size: {input_size}")
        for batch_size in batch_sizes:
            # Warm-up 
            inputs = torch.randn(
                batch_size, 3,input_size, input_size).to(device
            )
            with torch.no_grad():
                _ = model(inputs)
            
            measures = []
            for _ in range(100):
                inputs = torch.randn(
                    batch_size, 3,input_size, input_size).to(device
                )

                start_time = time.time()
                with torch.no_grad():
                    _ = model(inputs)
                torch.cuda.synchronize()
                elapsed_time = time.time() - start_time
                
                measures.append(elapsed_time)
            mean_measure = mean(measures)

            print(
                f"  batch size: {batch_size}: " f"{mean_measure*1e3:.2f} (ms)"
            )

Environment: google colab with T4 GPU. I registered same behavior on L4 GPU with pytorch/pytorch:2.4.1-cuda12.1-cudnn9-devel docker.

Also, I monitored GPU utalization and memory usage on L4 and noticed its using 70% of GPU and ~10% memory with bigest batch with biggest input.

Also, I tried using different networks, even simple conv2d layer and registered same results.

Is that behaviour expected?