Is there any function in pytorch like numpy.histogram?
numpy.histogram(torch_tensor.numpy())
?
Thanks for pointing that out.
I have a 3d tensor and I want to apply histogram on the 3rd dimension. How can I do that?
A histogram puts scalar values from a vector into different bins. If you pass in a multidimensional tensor, it treats it like a vector. Could you clarify what you mean?
I think he wants a histogram on z-axis for each couple (x,y) of an (x,y,z) tensor. That can be useful, for ex, in a loss function that compares histograms, which is the greedy way to compare statistical properties between two tensors (I tried to do that once for art-style transfer).
Unfortunately, if torch.histc
flats the tensor, I see no ways to avoid a loop over couples (x,y).
In that case then yes this needs to be achieved manually, probably using a loop. Unless numpy.histogram
does this, but I assume you would have mentioned if this was the case.
yes, @alexis-jacq understood my problem. I am implementing a model where I have a 3d tensor of shape batch_size x sequence_len x feature_size
and after applying histogram, I am expecting a tensor of shape batch_size x sequence_len x num_bins
.
I am expecting a loop-less solution. Otherwise it will take longer time in GPU.
Hi,
If you are in the mood for a weekend hack, here is a not entirely serious and apparently not very stable (it seemed to be crashing for me fairly often) solution without a for loop.
I’m not sure how you do with more than very few features, but here you go.
It uses the fact that sparse matrix indices may contain the same coordinate multiple times, the matrix entry is then the sum of all values at the coordinate.
import torch
from matplotlib import pyplot
%matplotlib inline
data = torch.randn(2,1000,2) # batch x draws x (x,y)
data[1,:,1] += data[1,:,0]
d_scaled = (data*2.5).long().view(-1,2)
d_scaled -= d_scaled.min()
d_idx0 = (torch.arange(0,data.size(0)).long().view(-1,1)*torch.LongTensor(1,data.size(1)).fill_(1)).view(-1,1)
d_idx = torch.cat([d_idx0, d_scaled], dim=1)
d_ones = torch.FloatTensor(d_idx0.size(0)).fill_(1.0)
st = torch.sparse.FloatTensor(d_idx.t(),d_ones,torch.Size((2,20,20)))
hist = st.to_dense()
pyplot.subplot(1,2,1)
pyplot.contour(hist[0].numpy(), extent=(-4,4,-4,4))
pyplot.subplot(1,2,2)
pyplot.contour(hist[1].numpy(), extent=(-4,4,-4,4))
Best regards
Thomas
I am picking up on this topic from 3 years ago and see your clever solution to interpolating or binning a set of N-dimensional points into an N-dimensional tensor. That’s what I want to do, but I’m wondering if, since you wrote that, is there now a better way than using the sparse()
command followed by to_dense()
?
Personally, I think there is nothing wrong with it, but you could try using scatter_add
from PyTorch Scatter instead.
Best regards
Thomas
Ah, interesting! Thanks Tom. However, I am trying to wrap my head around how I would scatter points into a 2D image/tensor – it looks like scatter only scatters along a single dimension.
If I can figure that out, then actually have floating point “indices” that I want to accumulate in a tensor. I suppose I could do a bilinear interpolation myself and then add them, unless you know of another function that would do that.
What about this one with a short loop, the number of iterations being the desired number of bins.
Inspired by python - Binning of data along one axis in numpy - Stack Overflow
def histo(tensor: torch.tensor, axis: int, nbins=50)->torch.tensor:
""""Return the tensor where the given axis is replaced by bincounts, binning along this axis.
Parameters
----------
tensor : torch.tensor
input tensor
axis : int
axis along which to perform the transformation
nbins : int, optional
number of bins, by default 50
Returns
-------
torch.tensor
We can consider the tensor as a family of lists of length tensor.shape[axis]. For each of these
lists, the interval between its top and bottom value is divide in nbins sub-intervals, for which we do a bincount.
The result of this bincount repalces the initial list in the output.
"""
bottom, _ = tensor.min(dim=axis)
top, _ = tensor.max(dim=axis)
step = (top - bottom) / nbins
boundaries = [bottom + i * step for i in range(nbins + 1)]
tensor = tensor.swapaxes(0, axis)
res = torch.zeros((nbins,) + tensor.shape[1:])
for i in range(nbins):
if i < nbins-1:
res[i] = torch.count_nonzero(
(boundaries[i] <= tensor) * (tensor < boundaries[i + 1]), dim=0
)
else:
res[i] = torch.count_nonzero(
(boundaries[i] <= tensor) * (tensor <= boundaries[i + 1]), dim=0
)
return res.swapaxes(0, axis)