Error When Using Multiple GPus

My code works fine when using just 1 GPU using torch.cuda.set_device(0) but it takes a lot of time to train in single GPU.

I tried various ways to Parallelize it, but nothing seems to work.
Currently Iam trying :

gpu_ids = [0,1,2,3]
model = torch.nn.DataParallel(model,device_ids=gpu_ids)

But when i try to access some model methods, like model.encoder.eval() it throws error saying 'DataParallel' object has no attribute 'encoder'

What am I missing ?

After wrapping the model in DataParallel the Modules are stored in model.module.
Try model.module.encoder.

1 Like

Hey it seems like solving my problem, but do i have to replace every instance of model.“abc” with

like :model.train() to model.module.train()
model.eval() to model.module.eval() and so on

Please let me know it there are some exceptions to this ?

@ptrblck Now it says cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/ and Uses only GPU- 0

Does DataParallel executes batches parallely and merges them together ?

No, just call .eval on the DataParallel instance.

DataParallel uses a bit more memory on the default GPU, which is GPU0 by default. If you are using this GPU for other processes, e.g. your desktop, you could change the order of device ids like: device_ids=[1, 0].

Previously I was doing like : model.encode.eval() and now doing model.module.encode.eval() is the right way rt ?

But I see that only 1 GPU being used in nvidia-smi . Any possible reason for same ? code works fine for 1 GPU. Can you have a look and suggest me what changes to make ? one version is what i changed after you suggested me… Please look where i did wrong…

Please Help me out… Its urgent

Remove torch.cuda.set_device(gpu) and try to use DataParallel again.
Also, could you delete loaded_model? It seems to use some GPU memory without being used.

1 Like

Yeah i runned it removingtorch.cude.set_device(gpu) . It runs but uses only 1 GPU. What i would like to do is say run batch1 in 1 GPU , second batch in another GPU and merge them together. Am i manually supposed to do that ?

Even training 8 epochs took me 24 hours with 1 GPU. I would like to speed this up .

This is done by DataParallel automatically. The batch is split onto the different GPUs.
I still don’t see, why your code only uses one GPU.

1 Like

No Idea… that’s why i shared code here in case i am missing some minute detail.

Since I cannot run the code on my machine, could you create a small code snippet with random input etc. so that I could debug the code?

Since project contains a lot of files, creating small snippet seems a difficult task to me. Can to suggest me how to debug the code for multi-gpu support ?

Can you think of any other reason why DataParallel not working based on your experience with pytorch and cuda ?

@ptrblck Thanks a lot for taking time to help me out :slight_smile:

Hey I am using as a starting point to my project. Running this in multi-gpu will be helpful.

Can you have a look ?

Could you walk me through the code a bit, so that it won’t take that much time to read all functions.
First I suppose I have to run and then just

just run only… there are many extra models tried… (so other names may be misleading) .

look only and different pytorch modules called from there.

Thanks for the info, unfortunately I need some additional files:

  • glove.42B.300d
  • .vector_cache/
  • .data/snli/snli_1.0_entail/snli_1.0_train_entail.jsonl

Could I create some of them with random values, since I don’t need the accuracy just an executable code?

Make snli_1.0_train_entail.jsonl a line seperated file in the required directory , like :

{sentence1 : "Ram is Good" , sentence2 : "Shyam is good", label : "ab"}
{sentence1 : "Food was awesome" , sentence2 : "Any sentence", label : "bc"}

sentence1 , sentence2 , label are keys. And same goes with test and dev files.

and make random values for others