Hi.
I’m trying to use MarianModels from Huggingface library for back-translation. I’m using 8 GPU’s however nvidia-smi shows the utilization of only one GPU although I used nn.Dataparallel(). Can anyone help me with this issue?
import torch
import torchtext
from transformers import MarianMTModel, MarianTokenizer
target_langs = ['fr,wa,frp,oc,ca,rm,lld,fur,lij,lmo,es,pt,gl,lad,an,mwl,it,co,nap,scn,vec,sc,ro,la']
def translate(texts, model, tokenizer, language="fr"):
with torch.no_grad():
template = lambda text: f"{text}" if language == "en" else f">>{language}<< {text}"
src_texts = [template(text) for text in texts]
encoded = tokenizer.prepare_seq2seq_batch(src_texts,
truncation=True,
max_length=300, return_tensors="pt").to(device)
translated = model.module.generate(**encoded).to(device)
translated_texts = tokenizer.batch_decode(translated, skip_special_tokens=True)
return translated_texts
def back_translate(texts, source_lang="en", target_lang="fr"):
# Translate from source to target language
fr_texts = translate(texts, target_model, target_tokenizer,
language=target_lang)
# Translate from target language back to source language
back_translated_texts = translate(fr_texts, en_model, en_tokenizer,
language=source_lang)
return back_translated_texts
target_model_name = 'Helsinki-NLP/opus-mt-en-de'
target_tokenizer = MarianTokenizer.from_pretrained(target_model_name)
target_model = MarianMTModel.from_pretrained(target_model_name)
en_model_name = 'Helsinki-NLP/opus-mt-de-en'
en_tokenizer = MarianTokenizer.from_pretrained(en_model_name)
en_model = MarianMTModel.from_pretrained(en_model_name)
target_model = nn.DataParallel(target_model)
target_model = target_model.to(device) # same performance if I add .half()
target_model.eval()
en_model = nn.DataParallel(en_model)
en_model = en_model.to(device)# same performance if I add .half()
en_model.eval()
for i , (x1 , x2 , label) in enumerate(loader):
with torch.no_grad():
## x1 and x2 are batches of strings.
bk_x1 = back_translate(x1, source_lang="en", target_lang=np.random.choice(target_langs))
bk_x2 = back_translate(x2, source_lang="en", target_lang=np.random.choice(target_langs))
here are GPU's performances: low utilization due to small batch size 16 but if I increase the batch size I got Cuda out of memory error. also, I can see only one GPU is used for processing so might be that the Marian model can not be parallelized correctly. which explains the reason for Cuda's out of memory error and slow performance. if so what would be the solution?
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:1B:00.0 Off | N/A |
| 42% 78C P2 199W / 250W | 9777MiB / 11178MiB | 91% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:1C:00.0 Off | N/A |
| 29% 36C P8 10W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:1D:00.0 Off | N/A |
| 31% 36C P8 9W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:1E:00.0 Off | N/A |
| 35% 41C P8 9W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX 108... Off | 00000000:3D:00.0 Off | N/A |
| 29% 34C P8 9W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX 108... Off | 00000000:3F:00.0 Off | N/A |
| 30% 31C P8 8W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 GeForce GTX 108... Off | 00000000:40:00.0 Off | N/A |
| 31% 38C P8 9W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 GeForce GTX 108... Off | 00000000:41:00.0 Off | N/A |
| 30% 37C P8 9W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 58780 C python 10407MiB |
| 1 N/A N/A 58780 C python 0MiB |
| 2 N/A N/A 58780 C python 0MiB |
| 3 N/A N/A 58780 C python 0MiB |
| 4 N/A N/A 58780 C python 0MiB |
| 5 N/A N/A 58780 C python 0MiB |
| 6 N/A N/A 58780 C python 0MiB |
| 7 N/A N/A 58780 C python 0MiB |
+-----------------------------------------------------------------------------+
FYI: I’m using
pytorch 1. 1.7.0
transformers 4.0.1
cudda 10.1