position: cuda:0
mask : cuda:0
input : cuda:0
position: cuda:0
mask : cuda:1
input : cuda:1
It turns out that after I revised the code, the position and mask seems to be in different cuda, even when I specify
CUDA_VISIBLE_DEVICE=1 python3.6 train.py
before running.