Resnet 50 inference speed

resnet50 fp32 pytorch inference speed : 6.5ms in a 3060 rtx with 64x64x3
resnet50 fp32 pytorch inference speed : 6.5ms in a 3060 rtx with 224x224x3
also when benchmark vgg16 224x224x3, got same speed, where vgg16 is almost 4x - 5x bigger model
why both are the same?

Your workload might be CPU-limited, e.g. caused by a generally slow CPU, or heavy CPU processing, data loading etc., which would starve your GPU. Without any details it’s pure speculation.

def benchmark(model, input_shape=(1024, 1, 32, 32), dtype='fp32', nwarmup=50, nruns=1000):
    input_data = torch.randn(input_shape)
    input_data = input_data.to("cuda")
    if dtype=='fp16':
        input_data = input_data.half()

    print("Warm up ...")
    with torch.no_grad():
        for _ in range(nwarmup):
            features = model(input_data)
    torch.cuda.synchronize()
    print("Start timing ...")
    timings = []
    with torch.no_grad():
        for i in range(1, nruns+1):
            start_time = time.time()
            output = model(input_data)
            torch.cuda.synchronize()
            end_time = time.time()
            
            
            timings.append(end_time - start_time)
            if i%100==0:
                print('Iteration %d/%d, avg batch time %.2f ms'%(i, nruns, np.mean(timings)*1000))

    print("Input shape:", input_data.size())
    print("Output shape:", output.shape)
    print('Average batch time: %.2f ms'%(np.mean(timings)*1000))

code I am using for benchmarking

Use a visual profiler to compare the timelines of all runs to check the kernel execution times, their launches, as well as CPU bottlenecks.