DataParallel uses a bit more memory on the default GPU, which is GPU0 by default. If you are using this GPU for other processes, e.g. your desktop, you could change the order of device ids like: device_ids=[1, 0].
https://ideone.com/gJVwSk code works fine for 1 GPU. Can you have a look and suggest me what changes to make ? https://ideone.com/YyBOa0 one version is what i changed after you suggested me… Please look where i did wrong…
Remove torch.cuda.set_device(gpu) and try to use DataParallel again.
Also, could you delete loaded_model? It seems to use some GPU memory without being used.
Yeah i runned it removingtorch.cude.set_device(gpu) . It runs but uses only 1 GPU. What i would like to do is say run batch1 in 1 GPU , second batch in another GPU and merge them together. Am i manually supposed to do that ?
Even training 8 epochs took me 24 hours with 1 GPU. I would like to speed this up .
Since project contains a lot of files, creating small snippet seems a difficult task to me. Can to suggest me how to debug the code for multi-gpu support ?
Can you think of any other reason why DataParallel not working based on your experience with pytorch and cuda ?
@ptrblck Thanks a lot for taking time to help me out
Could you walk me through the code a bit, so that it won’t take that much time to read all functions.
First I suppose I have to run gte_vae_pretrain.py and then just gte.py?