Explaination of behavior of a[[0,0]]=b ? Why always a[0]=b[0] and never a[0]=b[1]?

hellohawaii · July 22, 2022, 3:05pm

I have posted a question on stackoverflow: python - PyTorch behavior of a[[0,0]]=b ? Why always a[0]=b[0] and never a[0]=b[1]? (confict inside one single assignment expression) - Stack Overflow

Can someone help explain this phenomenon? How does Pytorch construct computation graph for this kind of assignment to a view of a tensor?

hellohawaii · July 27, 2022, 8:29am

I realized that this behavior may not be related to the compuation graph. It may be related to the __setitem__ of the tensor.

The a[[0,0]] uses advanced indexing. How does PyTorch implement the assignment of this kind of advanced indexed tensor? Will Pytorch solve the conflict inside the expression and copy data in parallel, or will PyTorch just copy data serially just like numpy?

As I stated in the stackoverflow link, I searched in the source code of PyTorch but failed to get answer. Thank you advance for any reply!

KFrank · July 27, 2022, 2:19pm

Hi Hello!

I don’t have an answer to your specific question, but let me offer some
speculation and context.

First, you should view this as “undefined behavior,” that is, anything could
happen. It is the user’s responsibility not to use duplicated indices in
cases such as these (e.g., assignment) where the duplicated indices
would need to be “resolved” somehow.

Note, apparently pytorch does not provide documentation for advanced
indexing, as lamented in this github issue.

The best I could find was this warning in the documentation for the related
case of scatter_():

Warning

When indices are not unique, the behavior is non-deterministic (one of the values from src will be picked arbitrarily) and the gradient will be incorrect (it will be propagated to all locations in the source that correspond to the same index)!

This is consistent with my experience, where you do get a valid value, but
without any guarantee as to which one. However, if the indexing (or, for that
matter, the scatter_()) algorithm makes use of parallelism (multiple cpu
or gpu pipelines), then I could imagine full-bore undefined behavior where,
for example, there’s a race condition in writing to the target location and
you end up with a garbage value where some of the bytes (or words) of
the target value come from one source location and some from another.

Whether this is “merely” non-deterministic (as stated in the scatter_()
documentation) or fully undefined, you should view doing this as user
error. Even if you “always get a[0]=b[0],” you might no longer get that
same result if you change the sizes of your tensors or move from the cpu
to the gpu or move to a different model of gpu or upgrade to a new version
of pytorch.

I can’t point you to the code where this actually happens, but even if I could,
it wouldn’t matter, because pytorch is free to change that code as long as
the new version still works for the unique-indices case, even if it gives you
a different result when the indices are not unique.

Best.

K. Frank

hellohawaii · July 27, 2022, 2:57pm

Thank you for your reply! Your explanation sounds reasonable. I will do some experiment of different tensor size/Pytorch version/GPU version.