You might be setting the env variable too late in your actual script. Once the CUDA context is created, this env variable won’t have any effect anymore, which is why I usually recommend to export
it in your current terminal or prepend it to your python
command in the terminal.
That is a good idea as DP
suffers from some overheads in cloning the model’s state_dict
in each forward pass as well as from an imbalanced GPU memory usage. DDP
should thus give you a better performance.