ImageNet example does not honor bucket_cap_mb

Running the imagenet example on a single node, 4-GPU setup calls 3 NCCL AllReduce ops per mini-batch for gradient synchronization, with sizes 2052000, 28852224, and 15853824 bytes.

I assumed that each op will follow the bucket_cap_mb limit, i.e. none of the allReduce would have sizes more than say, 25 MB (default)
Am I missing something?