Filtered mean and std from a tensor

Jorge_Garcia · March 23, 2022, 12:53pm

Hello!

I would like to get the mean and standard deviation from a tensor with shape (H,W) along the dimension 1 (so the output would be a pair of tensors with shape (H) or (H,1)). My problem is: I need to filter each row and only use the values selected by a given mask and the mask selects a different number of values from each row.

My actual approach is to generate another tensor with NaNs where I don’t care about the value and use torch.nanmean to get the mean ignoring those values but I don’t find an analogue function to get the std.

Example of behaviour.

>>> input = torch.randn(2, 4)
>>> input
tensor([[-1,4,3,8],
            [7,5,-2,-9]])
>>> mask = input >= 0
>>> input_nans = torch.where(mask, input, torch.nan)
>>> mean = torch.nanmean(input_nans, dim=1)
>>> mean
tensor([5, 6])

This should be feasible (a mean from the values greater than 0) but there is no nanstd function to get the standard deviation ignoring nans. It’s important to note that the first value comes from the mean over 3 values while the second comes from the mean over 2 values (that is the reason I cannot simply filter to get a tensor only with the desired values, the rows length would not match).

¿Any suggestion?

shivammehta007 · March 23, 2022, 1:02pm

If you have a mask constructed you can use the mask_select or just index it.

>>> a = torch.randn(3 ,3)
>>> a
tensor([[-1.6599, -0.0141, -0.9498],
        [ 0.5469, -0.0643, -2.0145],
        [ 0.4915, -1.7717, -1.9315]])
>>> mask = torch.randn(3, 3) > 0
>>> mask
tensor([[False, False, False],
        [ True, False, False],
        [ True, False,  True]])
>>> a[mask]
tensor([ 0.5469,  0.4915, -1.9315])
>>> a[mask].mean()
tensor(-0.2977)
>>> a[mask].std()
tensor(1.4152)

jayz · March 23, 2022, 1:37pm

Maybe implement your own nanstd?

def nanstd(x): 
    return torch.sqrt(torch.mean(torch.pow(x-torch.nanmean(x,dim=1).unsqueeze(-1),2)))

or sth similar.

Jorge_Garcia · March 23, 2022, 2:18pm

That is not my desired behaviour. I’ve extended the question with an example. I hope it to be more undestable now. Thanks!

Jorge_Garcia · March 23, 2022, 2:18pm

With that function, the output would include nans becaouse x can include nans. I’ve extended the question with an example. Thanks

ataxias · March 29, 2023, 4:12am

I think that jayz 's answer needed a small modification:

def nanstd(x): 
    return torch.sqrt(torch.nanmean(torch.pow(x - torch.nanmean(x, dim=-1).unsqueeze(-1), 2), dim=-1))

Wouldn’t this work for you?

Jeremyywb · June 18, 2023, 6:09am

def nanstd(o,dim):
    return torch.sqrt(
                torch.nanmean(
                    torch.pow( torch.abs(o-torch.nanmean(o,dim=dim).unsqueeze(dim)),2),
                    dim=dim)
                )
nanstd(o,1)

this should work,

eg:input (bs,seq,dim)
output (bs,dim) where you reduce seq dim to 1 to get std in seq axis without nan

allenyllee · March 5, 2024, 5:12am

Sometimes, you might want to set keepdim=True, here is modified version:

def nanstd(o, dim, keepdim=False):

    result = torch.sqrt(
                torch.nanmean(
                    torch.pow( torch.abs(o-torch.nanmean(o,dim=dim).unsqueeze(dim)),2),
                    dim=dim
                )
            )
    
    if keepdim:
        result = result.unsqueeze(dim)
    
    return result