I tried parallelizing my training to multiple GPUs using DataParallel
on two GTX1080 GPUs.
The training hangs after the start and I cannot even kill the docker container this is running in.
I can execute the same code on a single GPU without any problems.
I already tried the solutions described here and here.
Both didn’t help.
When I run p2pBandwidthLatencyTest
, I get the following output:
[P2P (Peer-to-Peer) GPU Bandwidth Latency Test]
Device: 0, GeForce GTX 1080, pciBusID: 3, pciDeviceID: 0, pciDomainID:0
Device: 1, GeForce GTX 1080, pciBusID: 41, pciDeviceID: 0, pciDomainID:0
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0
***NOTE: In case a device doesn't have P2P access to other one, it falls back to normal memcopy procedure.
So you can see lesser Bandwidth (GB/s) and unstable Latency (us) in those cases.
P2P Connectivity Matrix
D\D 0 1
0 1 1
1 1 1
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 44.99 4.63
1 4.73 49.25
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 176.67 0.46
1 0.66 255.81
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 257.41 5.53
1 5.40 256.42
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 255.38 1.12
1 1.28 254.56
P2P=Disabled Latency Matrix (us)
GPU 0 1
0 1.20 10.46
1 10.36 1.24
CPU 0 1
0 7.17 13.92
1 13.99 7.51
P2P=Enabled Latency (P2P Writes) Matrix (us)
(the test hangs after that)