### 🐛 Describe the bug
There is probably a precision error when using `torch.…distributed.send()` and `torch.distributed.recv()` pairs.
`torch.distributed.recv()` can receive a tensor correctly only when that tensor is sent of type `torch.float32`. The other float types `torch.float16` and `torch.float64` leads to wrong tensor values in receiver side.
## Reproduce
(on a dual GPU platform)
```python
import torch
import torch.multiprocessing as mp
import torch.distributed as dist
import time
def main_worker(rank, world_size, args):
dist.init_process_group(
backend="nccl",
init_method="tcp://127.0.0.1:9001",
world_size=world_size,
rank=rank,
)
print("process begin", rank)
for datatype in [None,torch.float,torch.float16,torch.float32,torch.float64]:
if rank == 0:
print(f"Current datatype: {datatype}.")
t = torch.rand([4,4],dtype=datatype).to(torch.device('cuda',rank))
print(f"Generate tensor{t}")
dist.send(t,1)
elif rank == 1:
r = torch.rand([4,4]).to(torch.device('cuda',rank))
dist.recv(r,0)
print("recv",r)
print()
time.sleep(1)
def main():
mp.spawn(main_worker, nprocs=2, args=(2, 2))
if __name__ == "__main__":
main()
```
## Output:
```python
process begin 0
Current datatype: None.
process begin 1
Generate tensortensor([[0.9230, 0.2856, 0.9419, 0.2844],
[0.9732, 0.7029, 0.0026, 0.9697],
[0.2188, 0.4143, 0.5163, 0.9863],
[0.1562, 0.3484, 0.1138, 0.3271]], device='cuda:0')
recv tensor([[0.9230, 0.2856, 0.9419, 0.2844],
[0.9732, 0.7029, 0.0026, 0.9697],
[0.2188, 0.4143, 0.5163, 0.9863],
[0.1562, 0.3484, 0.1138, 0.3271]], device='cuda:1')
Current datatype: torch.float32.
Generate tensortensor([[0.6158, 0.9911, 0.0677, 0.2109],
[0.0591, 0.5609, 0.4182, 0.4432],
[0.9296, 0.2350, 0.1028, 0.7265],
[0.1949, 0.0324, 0.4484, 0.8104]], device='cuda:0')
recv tensor([[0.6158, 0.9911, 0.0677, 0.2109],
[0.0591, 0.5609, 0.4182, 0.4432],
[0.9296, 0.2350, 0.1028, 0.7265],
[0.1949, 0.0324, 0.4484, 0.8104]], device='cuda:1')
Current datatype: torch.float16.
Generate tensortensor([[0.7212, 0.8945, 0.3042, 0.3184],
[0.3804, 0.9648, 0.8076, 0.9756],
[0.3862, 0.7358, 0.6611, 0.2539],
[0.4365, 0.9434, 0.7075, 0.6084]], device='cuda:0',
dtype=torch.float16)
recv tensor([[2.5669e-03, 5.6701e-07, 5.6217e-03, 6.2936e-03],
[4.3337e-04, 1.3432e-07, 4.2790e-03, 1.0597e-04],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]], device='cuda:1')
Current datatype: torch.float32.
Generate tensortensor([[0.7289, 0.0532, 0.6294, 0.5030],
[0.1043, 0.3015, 0.2626, 0.2357],
[0.8202, 0.1919, 0.3556, 0.2653],
[0.9763, 0.3292, 0.9931, 0.8236]], device='cuda:0')
recv tensor([[0.7289, 0.0532, 0.6294, 0.5030],
[0.1043, 0.3015, 0.2626, 0.2357],
[0.8202, 0.1919, 0.3556, 0.2653],
[0.9763, 0.3292, 0.9931, 0.8236]], device='cuda:1')
Current datatype: torch.float64.
Generate tensortensor([[0.1401, 0.5205, 0.3881, 0.1536],
[0.4686, 0.3280, 0.0725, 0.7440],
[0.5029, 0.2960, 0.5149, 0.2452],
[0.2024, 0.5243, 0.8930, 0.2613]], device='cuda:0',
dtype=torch.float64)
recv tensor([[-9.8724e-14, 1.5151e+00, 1.6172e-05, 1.7551e+00],
[ 1.6390e+35, 1.6941e+00, -1.9036e+12, 1.5286e+00],
[ 6.4192e-38, 1.7343e+00, 2.3965e+07, 1.6640e+00],
[-1.7412e+19, 1.3951e+00, -6.5071e+35, 1.8110e+00]], device='cuda:1')
```
### Versions
I tested and confirmed this phenomenon on two dual-GPU PCs and two versions of `PyTorch`.
#### Configure0: PyTorch `1.12.0`+`TITAN RTX`*2
```python
Collecting environment information...
PyTorch version: 1.12.0
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-6ubuntu2) 7.5.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.13.0-52-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.7.64
GPU models and configuration:
GPU 0: NVIDIA TITAN RTX
GPU 1: NVIDIA TITAN RTX
Nvidia driver version: 510.73.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] info-nce-pytorch==0.1.4
[pip3] numpy==1.22.3
[pip3] torch==1.12.0
[pip3] torch-tb-profiler==0.4.0
[pip3] torchaudio==0.12.0
[pip3] torchinfo==1.6.5
[pip3] torchvision==0.13.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.6.0 hecad31d_10 conda-forge
[conda] info-nce-pytorch 0.1.4 pypi_0 pypi
[conda] libblas 3.9.0 12_linux64_mkl conda-forge
[conda] libcblas 3.9.0 12_linux64_mkl conda-forge
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] numpy 1.22.3 py39he7a7128_0
[conda] numpy-base 1.22.3 py39hf524024_0
[conda] pytorch 1.12.0 py3.9_cuda11.6_cudnn8.3.2_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch-tb-profiler 0.4.0 pypi_0 pypi
[conda] torchaudio 0.12.0 py39_cu116 pytorch
[conda] torchinfo 1.6.5 pyhd8ed1ab_0 conda-forge
[conda] torchvision 0.13.0 py39_cu116 pytorch
```
#### Configure1: PyTorch `1.8.2 LTS`+`TITAN RTX`*2
```python
Collecting environment information...
PyTorch version: 1.8.2
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-6ubuntu2) 7.5.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.13.0-52-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.7.64
GPU models and configuration:
GPU 0: NVIDIA TITAN RTX
GPU 1: NVIDIA TITAN RTX
Nvidia driver version: 510.73.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] info-nce-pytorch==0.1.4
[pip3] numpy==1.22.3
[pip3] torch==1.8.2
[pip3] torchaudio==0.8.2
[pip3] torchinfo==1.6.5
[pip3] torchvision==0.9.2
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.1.74 h6bb024c_0 nvidia
[conda] info-nce-pytorch 0.1.4 pypi_0 pypi
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h7f8727e_0
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h51133e4_0
[conda] numpy 1.21.5 py38he7a7128_2
[conda] numpy-base 1.21.5 py38hf524024_2
[conda] pytorch 1.8.2 py3.8_cuda11.1_cudnn8.0.5_0 pytorch-lts
[conda] torchaudio 0.8.2 py38 pytorch-lts
[conda] torchinfo 1.6.5 pyhd8ed1ab_0 conda-forge
[conda] torchvision 0.9.2 py38_cu111 pytorch-lts
```
#### Configure2: PyTorch `1.12.0`+`RTX3090`*2
```python
Collecting environment information...
PyTorch version: 1.12.0
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.0-rc2
Libc version: glibc-2.31
Python version: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.13.0-52-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.7.64
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090
Nvidia driver version: 510.73.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] info-nce-pytorch==0.1.4
[pip3] numpy==1.22.3
[pip3] torch==1.12.0
[pip3] torchaudio==0.12.0
[pip3] torchvision==0.13.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.6.0 habf752d_9 nvidia
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] info-nce-pytorch 0.1.4 pypi_0 pypi
[conda] libblas 3.9.0 12_linux64_mkl conda-forge
[conda] libcblas 3.9.0 12_linux64_mkl conda-forge
[conda] liblapack 3.9.0 12_linux64_mkl conda-forge
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py310ha2c4b55_0 conda-forge
[conda] mkl_fft 1.3.1 py310h2b4bcf5_1 conda-forge
[conda] mkl_random 1.2.2 py310h00e6091_0
[conda] numpy 1.22.3 py310hfa59a62_0
[conda] numpy-base 1.22.3 py310h9585f30_0
[conda] pytorch 1.12.0 py3.10_cuda11.6_cudnn8.3.2_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 0.12.0 py310_cu116 pytorch
[conda] torchvision 0.13.0 py310_cu116 pytorch
```
#### Configure3: PyTorch `1.8.2 LTS`+`RTX3090`*2
```python
Collecting environment information...
PyTorch version: 1.8.2
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.0-rc2
Libc version: glibc-2.31
Python version: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.13.0-52-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.7.64
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090
Nvidia driver version: 510.73.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] info-nce-pytorch==0.1.4
[pip3] numpy==1.21.5
[pip3] torch==1.8.2
[pip3] torch-tb-profiler==0.3.1
[pip3] torchaudio==0.8.2
[pip3] torchinfo==1.7.0
[pip3] torchvision==0.9.2
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.1.74 h6bb024c_0 nvidia
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h7f8727e_0
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h51133e4_0
[conda] numpy 1.21.5 py38he7a7128_2
[conda] numpy-base 1.21.5 py38hf524024_2
[conda] pytorch 1.8.2 py3.8_cuda11.1_cudnn8.0.5_0 pytorch-lts
[conda] torchaudio 0.8.2 py38 pytorch-lts
[conda] torchinfo 1.7.0 pyhd8ed1ab_0 conda-forge
[conda] torchvision 0.9.2 py38_cu111 pytorch-lts
```
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang @kwen2501