1D and 2D Histograms in PyTorch


I am trying to implement an efficient histogram method in PyTorch.
I know PyTorch already has a histc and bincount though there is no 2D version of that.

I am looking to implement it in parallel to be fast; i.e for a vector of k values and using n bins, I want to do k*n operations in parallel.

I have looked at max grid & block size for CUDA and it is doable giving my number of bins and size of vector.

How should I go about and do this to be integrated with PyTorch?