N dimensional histogram function in pytorch CPU/GPU

Marian_Ivanov · November 3, 2019, 5:28pm

We are working on interactive visualization of multi-dimensional data
Usually O(2-7) dimensions (still dense) Currently we are using numpy implementation for multidimensional histogram filling , slicing, summing:

creating NDimensional histogram once:

H, axis = np.histogramdd(inputArray.values, bins=bins, range=hRange)

querying interactively - many times

y = np.sum(H[hSliceLocal], axis=axis)

I would like to use PyTorch (or Numba) to get faster GPU implementation

My questions are following (my current Python experience is limitted):

Before investing time, which speed-up factor do you expect in histogramm filling resp. slicing ?
Did PyTorch team consider providing implementation such as in numpy? We would be happy to contribute, but our experience is limited

I’m aware of the discussion (from 2017) in

related to histogramming in 1D resp. hack for 3D.

I’m looking for equivalent of the numpy.histogramdd.html (CPU/GPU)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html

albanD · November 3, 2019, 6:12pm

Hi,

Does histc work for you? Or you would need a multidimensional version of it?

Marian_Ivanov · November 4, 2019, 8:06am

I personally prefer multi-dimensional version of it with interface as in the numpy.histogramdd
(see my numpy examples in original post)

albanD · November 4, 2019, 4:11pm

I am not familiar with this particular numpy function
I don’t think we have such function in pytorch at the moment, but I’m sure we will be happy to accept a PR to add this new feature.

Do you know how it is implemented? Is it only a wrapper around the 1D version, or does it need substantially different algorithms?

Marian_Ivanov · November 4, 2019, 9:07pm

I checked numpy implementation in https://github.com/numpy/numpy/blob/v1.17.0/numpy/lib/histograms.py#L935-L1113
At the end algorithm in (step 3) is using 1D histogramming - np.bincount
Hovewer, numpy implementation assume non uniform binning - to find bins they use internally np.searchsorted.

In case searchsorted will be difficult to implement (I did not find it in pytorch= only external https://github.com/aliutkus/torchsearchsorted), uniform binning will be sufficient.

Algorithm:

Calculate bin edges if not provided
Compute the bin number each sample falls into - assuming non uniform binning
np.searchsorted
Compute the sample indices in the flattened histogram matrix
np.ravel_multi_index
Compute the number of repetitions in xy and assign it to the flattened histmat.
np.bincount
Shape into a proper matrix
hist.reshape(nbin)

albanD · November 4, 2019, 9:16pm

That would be a useful addition I think.
Could you open an issue on github for a new feature with these informations?

Marian_Ivanov · November 5, 2019, 5:48pm

New issue feature request opened: https://github.com/pytorch/pytorch/issues/29209