Problems with inconsistencies in assignment operations

YaqiXia · June 21, 2022, 7:46am

As shown in the code demo below, I want to update the data at the specified location of data_store through tensor

import torch

device = "cuda:0"
device = "cpu"
data_store = torch.zeros([5,3]).to(device)
data_store_cp = torch.zeros([5,3]).to(device)
index = torch.randint(0,4,[100]).to(device)
data = torch.randn([index.size(0),3]).to(device)
data_store[index.long()]=data
data_store_cp[index.long()] = data
print(f"equal: {torch.equal(data_store, data_store_cp)}")

When

device = "cpu"

I get “True”, however when I switch

device = "cuda:0"

I get “False”.
I wonder why the result is like this and how to modify it to make the result “Ture” under the cuda setting?
Thank you in advance.

Verision:

pytorch: 1.9.0
cuda: 11.0

swap · June 21, 2022, 11:27am

see torch.equals bug

github.com/pytorch/pytorch

torch.equals bug

opened 11:39PM - 09 Jan 19 UTC

closed 04:50AM - 10 Jan 19 UTC

PetrochukM

## 🐛 Bug Torch equals does not work. ## To Reproduce ``` >>> import to…rch >>> t =torch.tensor([0.3000, 0.3000, 0.0000, 0.0000]) >>> t1 =torch.tensor([0.3000, 0.3000, 0.0000, 0.0000]) >>> torch.equal(t, t1) True >>> t1 =torch.tensor([0.3000, 0.3000, 0.0000, 0.0000]) + 1 - 1 >>> torch.equal(t, t1) False >>> t1 tensor([0.3000, 0.3000, 0.0000, 0.0000]) ``` ## Expected behavior That equality still works after some operations. ## Environment ``` Collecting environment information... PyTorch version: 1.0.0 Is debug build: No CUDA used to build PyTorch: None OS: Mac OSX 10.14.1 GCC version: Could not collect CMake version: Could not collect Python version: 3.6 Is CUDA available: No CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Versions of relevant libraries: [pip] Could not collect [conda] Could not collect ```

YaqiXia · June 21, 2022, 12:48pm

Thanks for your reply!
But when I check the content in the data_store and data_store_cp, most of the time they are different.

KFrank · June 21, 2022, 1:22pm

Hi Yaqi!

At issue is that index contains many duplicate values so you are therefore
attempting to write to the same location in data_store multiple times in a
single assignment operation. Doing so is not supported by pytorch and is
not well defined.

Why you get different results from nearly identical assignment statements
is a bit of a mystery – pytorch is presumably taking advantage of the
freedom given it not to attempt to make the multiple write well defined to
perform some optimization (especially on the gpu).

Best.

K. Frank