It seems that for a tensor with dype=uint8, device=cuda, if there exits element=255, then torch.bincount will not count other bins than 255.
The problem only occurs when tensor on GPU.
I can see this on some post-1.11 cut dev branch. At first glance, it could be a bug with the CUDA implementation that shows with uint8 - for dtype int and long bincount seems to work as expected for me.
Now I found it.
The maxvalue computed below has an overflow (we have nbins = 256 and so maxvalue will be 0 for uint8). Right now I’m checking whether the fix should be clamping to the numeric type max value or setting the maxvalue needs to be nbins - 1 and then I’ll send a PR.
Best regards
Thomas
P.S.: Thank you @HAL-42 for reporting this with repro. That is always very helpful.