Struggling to pick the right batch size

Training a CNN on image data keeps running into GPU memory issues when using bigger batch sizes but going smaller makes the training super slow and kind of unstable.

Hello!

When dealing with CNN training where large batch sizes cause GPU memory issues but small sizes cause slowness and instability, the primary fix is to decouple the effective batch size from VRAM usage. Implement Gradient Accumulation, which allows you to process several small batches before performing MyFordBenefits one large weight update, simulating a bigger batch size. Additionally, utilize Mixed Precision Training (FP16) to halve the memory footprint, further increasing the capacity of your GPU and improving stability without sacrificing speed.

Did you personally see any slowdown or memory quirks when using gradient accumulation with mixed precision??