Torch.bincount behaves differently on CPU and GPU

It seems that for a tensor with dype=uint8, device=cuda, if there exits element=255, then torch.bincount will not count other bins than 255.
The problem only occurs when tensor on GPU.

My pytorch version is 1.10.0.

This error doesn’t occur in pytorch 1.11; can you try updating your version?

It seems this error still occur in pytorch 1.11.0,torch.cuda.version=11.3。

I can see this on some post-1.11 cut dev branch. At first glance, it could be a bug with the CUDA implementation that shows with uint8 - for dtype int and long bincount seems to work as expected for me.

Best regards


Staring down the code a bit, this might be suspicious:

I will check if that is the culprit and if so, I’ll send a PR.

Best regards


Edited: This was not it. The computation is in int64 anyways.

1 Like

Now I found it.
The maxvalue computed below has an overflow (we have nbins = 256 and so maxvalue will be 0 for uint8). Right now I’m checking whether the fix should be clamping to the numeric type max value or setting the maxvalue needs to be nbins - 1 and then I’ll send a PR.

Best regards


P.S.: Thank you @HAL-42 for reporting this with repro. That is always very helpful.

P.P.S.: I sent Fix bincount to use acc scalar for the bounds by t-vi · Pull Request #76979 · pytorch/pytorch · GitHub , we will see how it goes.