I am training Albert language model using huggingface transformer. While training I notice that on my p3dn instance,gpu 0 is almost completely used but others have around 50% ram unused. I am getting only 85 batch size on this system and above this OOM.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:16.0 Off | 0 |
| N/A 77C P0 291W / 300W | 30931MiB / 32510MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:00:17.0 Off | 0 |
| N/A 71C P0 255W / 300W | 18963MiB / 32510MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:00:18.0 Off | 0 |
| N/A 71C P0 95W / 300W | 18963MiB / 32510MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:00:19.0 Off | 0 |
| N/A 68C P0 89W / 300W | 18963MiB / 32510MiB | 72% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... On | 00000000:00:1A.0 Off | 0 |
| N/A 68C P0 78W / 300W | 18963MiB / 32510MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... On | 00000000:00:1B.0 Off | 0 |
| N/A 69C P0 96W / 300W | 18963MiB / 32510MiB | 65% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... On | 00000000:00:1C.0 Off | 0 |
| N/A 69C P0 79W / 300W | 18963MiB / 32510MiB | 95% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:00:1D.0 Off | 0 |
| N/A 74C P0 80W / 300W | 18963MiB / 32510MiB | 12% Default |
+-------------------------------+----------------------+----------------------+
I was using default setting for it using data parallel.
I tried distributed training also using python -m torch.distributed.launch --nproc_per_node 8 test_lm.py
but It started new job for each and every GPU.
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Calling AlbertTokenizer.from_pretrained() with the path to a single file or url is deprecated
Calling AlbertTokenizer.from_pretrained() with the path to a single file or url is deprecated
/language_model/lm/lib/python3.6/site-packages/transformers/tokenization_utils.py:830: FutureWarning: Parameter max_len is deprecated and will be removed in a future release. Use model_max_length instead.
category=FutureWarning,
/language_model/lm/lib/python3.6/site-packages/transformers/tokenization_utils.py:830: FutureWarning: Parameter max_len is deprecated and will be removed in a future release. Use model_max_length instead.
category=FutureWarning,
Calling AlbertTokenizer.from_pretrained() with the path to a single file or url is deprecated
/language_model/lm/lib/python3.6/site-packages/transformers/tokenization_utils.py:830: FutureWarning: Parameter max_len is deprecated and will be removed in a future release. Use model_max_length instead.
category=FutureWarning,
Calling AlbertTokenizer.from_pretrained() with the path to a single file or url is deprecated
/language_model/lm/lib/python3.6/site-packages/transformers/tokenization_utils.py:830: FutureWarning: Parameter max_len is deprecated and will be removed in a future release. Use model_max_length instead.
category=FutureWarning,
Calling AlbertTokenizer.from_pretrained() with the path to a single file or url is deprecated
/language_model/lm/lib/python3.6/site-packages/transformers/tokenization_utils.py:830: FutureWarning: Parameter max_len is deprecated and will be removed in a future release. Use model_max_length instead.
category=FutureWarning,
Calling AlbertTokenizer.from_pretrained() with the path to a single file or url is deprecated
/language_model/lm/lib/python3.6/site-packages/transformers/tokenization_utils.py:830: FutureWarning: Parameter max_len is deprecated and will be removed in a future release. Use model_max_length instead.
category=FutureWarning,
Calling AlbertTokenizer.from_pretrained() with the path to a single file or url is deprecated
/language_model/lm/lib/python3.6/site-packages/transformers/tokenization_utils.py:830: FutureWarning: Parameter max_len is deprecated and will be removed in a future release. Use model_max_length instead.
category=FutureWarning,
Calling AlbertTokenizer.from_pretrained() with the path to a single file or url is deprecated
/language_model/lm/lib/python3.6/site-packages/transformers/tokenization_utils.py:830: FutureWarning: Parameter max_len is deprecated and will be removed in a future release. Use model_max_length instead.
category=FutureWarning,
Can anyone suggest what I should do for efficient training?