Histogram function in pytorch

wasiahmad · July 27, 2017, 2:44am

Is there any function in pytorch like numpy.histogram?

alexis-jacq · July 27, 2017, 7:02am

numpy.histogram(torch_tensor.numpy()) ?

Kaixhin · July 27, 2017, 10:12am

torch.histc

wasiahmad · July 28, 2017, 12:45am

Thanks for pointing that out.

I have a 3d tensor and I want to apply histogram on the 3rd dimension. How can I do that?

Kaixhin · July 28, 2017, 8:27am

A histogram puts scalar values from a vector into different bins. If you pass in a multidimensional tensor, it treats it like a vector. Could you clarify what you mean?

alexis-jacq · July 28, 2017, 8:44am

I think he wants a histogram on z-axis for each couple (x,y) of an (x,y,z) tensor. That can be useful, for ex, in a loss function that compares histograms, which is the greedy way to compare statistical properties between two tensors (I tried to do that once for art-style transfer).

Unfortunately, if torch.histc flats the tensor, I see no ways to avoid a loop over couples (x,y).

Kaixhin · July 28, 2017, 10:44am

In that case then yes this needs to be achieved manually, probably using a loop. Unless numpy.histogram does this, but I assume you would have mentioned if this was the case.

wasiahmad · July 28, 2017, 11:38pm

yes, @alexis-jacq understood my problem. I am implementing a model where I have a 3d tensor of shape batch_size x sequence_len x feature_size and after applying histogram, I am expecting a tensor of shape batch_size x sequence_len x num_bins.

I am expecting a loop-less solution. Otherwise it will take longer time in GPU.

tom · July 29, 2017, 10:00pm

Hi,

If you are in the mood for a weekend hack, here is a not entirely serious and apparently not very stable (it seemed to be crashing for me fairly often) solution without a for loop.

I’m not sure how you do with more than very few features, but here you go.
It uses the fact that sparse matrix indices may contain the same coordinate multiple times, the matrix entry is then the sum of all values at the coordinate.

import torch
from matplotlib import pyplot
%matplotlib inline

data = torch.randn(2,1000,2) # batch x draws x (x,y)
data[1,:,1] += data[1,:,0]
d_scaled = (data*2.5).long().view(-1,2)
d_scaled -= d_scaled.min()
d_idx0 = (torch.arange(0,data.size(0)).long().view(-1,1)*torch.LongTensor(1,data.size(1)).fill_(1)).view(-1,1)
d_idx = torch.cat([d_idx0, d_scaled], dim=1)
d_ones = torch.FloatTensor(d_idx0.size(0)).fill_(1.0)
st = torch.sparse.FloatTensor(d_idx.t(),d_ones,torch.Size((2,20,20)))
hist = st.to_dense()
pyplot.subplot(1,2,1)
pyplot.contour(hist[0].numpy(), extent=(-4,4,-4,4))
pyplot.subplot(1,2,2)
pyplot.contour(hist[1].numpy(), extent=(-4,4,-4,4))

Best regards

Thomas

Daniel_Morris · June 6, 2020, 9:55am

I am picking up on this topic from 3 years ago and see your clever solution to interpolating or binning a set of N-dimensional points into an N-dimensional tensor. That’s what I want to do, but I’m wondering if, since you wrote that, is there now a better way than using the sparse() command followed by to_dense()?

tom · June 6, 2020, 1:15pm

Personally, I think there is nothing wrong with it, but you could try using scatter_add from PyTorch Scatter instead.

Best regards

Thomas

Daniel_Morris · June 6, 2020, 3:59pm

Ah, interesting! Thanks Tom. However, I am trying to wrap my head around how I would scatter points into a 2D image/tensor – it looks like scatter only scatters along a single dimension.
If I can figure that out, then actually have floating point “indices” that I want to accumulate in a tensor. I suppose I could do a bilinear interpolation myself and then add them, unless you know of another function that would do that.

completementgaga · January 24, 2025, 12:28pm

What about this one with a short loop, the number of iterations being the desired number of bins.
Inspired by python - Binning of data along one axis in numpy - Stack Overflow

def histo(tensor: torch.tensor, axis: int, nbins=50)->torch.tensor:
    """"Return the tensor where the given axis is replaced by bincounts, binning along this axis.

    Parameters
    ----------
    tensor : torch.tensor
        input tensor
    axis : int
        axis along which to perform the transformation
    nbins : int, optional
        number of bins, by default 50

    Returns
    -------
    torch.tensor
        We can consider the tensor as a family of lists of length tensor.shape[axis]. For each of these
        lists, the interval between its top and bottom value is divide in nbins sub-intervals, for which we do a bincount.
        The result of this bincount repalces the initial list in the output.
    """

    bottom, _  = tensor.min(dim=axis)
    top, _ = tensor.max(dim=axis)
    step = (top - bottom) / nbins
    boundaries = [bottom + i * step for i in range(nbins + 1)]
    tensor = tensor.swapaxes(0, axis)
    res = torch.zeros((nbins,) + tensor.shape[1:])
    for i in range(nbins):
        if i < nbins-1:
            res[i] = torch.count_nonzero(
                (boundaries[i] <= tensor) * (tensor < boundaries[i + 1]), dim=0
            )
        else:
            res[i] = torch.count_nonzero(
                (boundaries[i] <= tensor) * (tensor <= boundaries[i + 1]), dim=0
            )
    return res.swapaxes(0, axis)