I was trying to test the TorchDynamo Benchmark Suite.
In the model list files:
Or in the yml files:
- torchbench.yml
batch_size: training: demucs: 4 dlrm: 1024 densenet121: 4 hf_Reformer: 4 hf_T5_base: 4 timm_efficientdet: 1 llama_v2_7b_16h: 1 # reduced from 16 due to cudagraphs OOM in TorchInductor dashboard yolov3: 8 inference: timm_efficientdet: 32
- huggingface.py
model.gradient_checkpointing_enable()
if model_name in BATCH_SIZE_KNOWN_MODELS:
batch_size_default = BATCH_SIZE_KNOWN_MODELS[model_name]
elif batch_size is None:
batch_size_default = 16
log.info(
f"Batch size not specified for {model_name}. Setting batch_size=16"
)
if batch_size is None:
batch_size = batch_size_default
if model_name in BATCH_SIZE_DIVISORS:
batch_size = max(int(batch_size / BATCH_SIZE_DIVISORS[model_name]), 1)
log.info(
f"Running smaller batch size={batch_size} for {model_name}, orig batch_size={batch_size_default}"
)
In the above cases, where a specific batch size is associated with a particular model name, (also as in huggingface, there is the use of BATCH_SIZE_DIVISORS to divide the default batch size) is there any specific reason for chosing that particular “default” batch size in the first place? Like are these batch sizes the one which performs the best with torch dynamo?
Is the selection of these batch sizes just random or intentional?
Also, in the accuracy tests, I find that using a batch size greater than recommended leads to accuracy failure.