nn.DataParallel gets stuck

Hello @ptrblck ,

I updated drivers to 470. The issue persists, attaching below the output of watch nvidia-smi,

Every 2.0s: nvidia-smi                                                                                                                                                ampere.lix.polytechnique.fr: Thu Jul 22 09:21:53 2021

Thu Jul 22 09:21:53 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:21:00.0 Off |                    0 |
| N/A   25C    P0    59W / 250W |   1768MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  On   | 00000000:81:00.0 Off |                    0 |
| N/A   25C    P0    58W / 250W |   1502MiB / 40536MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCI...  On   | 00000000:E2:00.0 Off |                    0 |
| N/A   20C    P0    32W / 250W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      5629      C   .../envs/torch1p9/bin/python     1765MiB |
|    1   N/A  N/A      5630      C   .../envs/torch1p9/bin/python     1499MiB |
+-----------------------------------------------------------------------------+

I ran the same script which you suggested above. Using the latest release of pytorch. It gets stuck once again, but this time at least, ctrl+c can interrupt the script and I don’t have to kill it to stop.