Statistics for whole dataset

how to calculate statistics (mean, variance, skewness,kutosis)on whole dataset . loading huge dataset at once is a problem. is there any trick for this?

i could not find in built functions for skewness , kutosis for tensors.
how to move numpy nd array to gpu in that case?

You can create gpu tensor from numpy array like this:

array = torch.from_numpy(some_np_array).float().to(device)

and calculate like this:

mean = torch.mean(array)
diffs = array - mean
var = torch.mean(torch.pow(diffs, 2.0))
std = torch.pow(var, 0.5)
zscores = diffs / std
skews = torch.mean(torch.pow(zscores, 3.0))
kurtoses = torch.mean(torch.pow(zscores, 4.0)) - 3.0 
1 Like

thank you!
but what is d in diffs = d - mean ?

I’m sorry it’s supposed to be array. I will edit answer

thanks!
for few datasets, the skewness and kurtosis values are nan. could you please tell me why is that? what is the statistical significance of that?

It could be alot of reasons. Like std == 0 => zscores = inf and so on. I recommend you to print each number to make sure which one is NaN

If this solved your problem could you please mark it as a solution?

yes. thanks for your support!