Is it possible to do this in pytorch ?
If this is not supported now, do you have any plan to support this ?
Also, when I employed DataParallel, it seemed to me that pytorch use only one thread.
First GPU consumed almost all memory, while the others consumed only half of their memories.
Is this normal ?