Bigger batch size slows down epoch training time

I am training a MobileNetV3 + DeepLabV3 large (taken directly from Pytorch)

    opt_model = torchvision.models.segmentation.deeplabv3_mobilenet_v3_large(
        progress=True,
        num_classes=1,
    ).cuda()

I have a custom dataset I am using for body segmentation. When increasing batch size from 32 images to 256, the time it takes to go through one epoch increases. I tested to see which part of the iterations in the epoch took most of the time and it was these 2 :

            scaler.scale(loss).backward()  # type: ignore
            scaler.step(optimizer)

I am not sure what’s going on. My assumptions are that with a bigger batch size, the faster it will be to go through 1 epoch.

Solution was found

The default pytorch model I was using has a bottleneck

Did you create a visual profile to narrow down the bottleneck or how did you isolate these lines?

I too the time of each line to execute

I still don’t know how exactly you’ve profiled the code, but be careful with host timers in case you are using the GPU as CUDA operations are executed asynchronously. You would thus need to synchronize the code before starting and stopping your timers.