I am fine tuning masked language model from XLM Roberta large
on google machine specs.
I made couple of experiments and was strange to see few results.
"a2-highgpu-4g" ,accelerator_count=4, accelerator_type="NVIDIA_TESLA_A100" on 4,12,672 data batch size 4 Running ( 4 data*4 GPU=16 data points)
"a2-highgpu-4g" ,accelerator_count=4 , accelerator_type="NVIDIA_TESLA_A100"on 4,12,672 data batch size 8 failed
"a2-highgpu-4g" ,accelerator_count=4, accelerator_type="NVIDIA_TESLA_A100" on 4,12,672 data batch size 16 failed
"a2-highgpu-4g" ,accelerator_count=4.,accelerator_type="NVIDIA_TESLA_A100" on 4,12,672 data batch size 32 failed
I was not able to train model with batch size
more than 4 on # of GPU’s. It stopped in mid-way.
Here is the code I am using.
training_args = tr.TrainingArguments(
# disable_tqdm=True,
output_dir='/home/pc/Bert_multilingual_exp_TCM/results_mlm_exp2',
overwrite_output_dir=True,
num_train_epochs=2,
per_device_train_batch_size=4,
# per_device_train_batch_size
# per_gpu_train_batch_size
prediction_loss_only=True
,save_strategy="no"
,run_name="MLM_Exp1"
,learning_rate=2e-5
,logging_dir='/home/pc/Bert_multilingual_exp_TCM/logs_mlm_exp1' # directory for storing logs
,logging_steps=40000
,logging_strategy='no'
)
trainer = tr.Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=train_data
)
My Questions
How can I train with larger batch size on a2-highgpu-4g
machine?
Which parameters can I include in TrainingArguments
so that training is fast and occupies small memory?
Thanks in advance.
Versions
torch==1.11.0+cu113
torchvision==0.12.0+cu113
torchaudio==0.11.0+cu113
transformers==4.17.0