Statistics for whole dataset

poojitharamachandra · March 27, 2020, 8:44am

how to calculate statistics (mean, variance, skewness,kutosis)on whole dataset . loading huge dataset at once is a problem. is there any trick for this?

i could not find in built functions for skewness , kutosis for tensors.
how to move numpy nd array to gpu in that case?

jubick · March 27, 2020, 8:50am

You can create gpu tensor from numpy array like this:

array = torch.from_numpy(some_np_array).float().to(device)

and calculate like this:

mean = torch.mean(array)
diffs = array - mean
var = torch.mean(torch.pow(diffs, 2.0))
std = torch.pow(var, 0.5)
zscores = diffs / std
skews = torch.mean(torch.pow(zscores, 3.0))
kurtoses = torch.mean(torch.pow(zscores, 4.0)) - 3.0

poojitharamachandra · March 27, 2020, 10:36am

thank you!
but what is d in diffs = d - mean ?

jubick · March 27, 2020, 12:27pm

I’m sorry it’s supposed to be array. I will edit answer

poojitharamachandra · March 28, 2020, 1:23pm

thanks!
for few datasets, the skewness and kurtosis values are nan. could you please tell me why is that? what is the statistical significance of that?

jubick · March 30, 2020, 9:37am

It could be alot of reasons. Like std == 0 => zscores = inf and so on. I recommend you to print each number to make sure which one is NaN

jubick · March 31, 2020, 9:04am

If this solved your problem could you please mark it as a solution?

poojitharamachandra · April 6, 2020, 10:35am

yes. thanks for your support!