Standard deviation

julioeu99 · May 27, 2020, 9:55pm

Hi, i want to calculate the standard deviation for this distance:
de+=distance.euclidean(output,input)

That´s what i tried:
print(de.std(dim=1))

Then i get the following error: AttributeError: ‘float’ object has no attribute ‘std’
What am i doing wrong here?

Nikronic · May 28, 2020, 11:44am

Hi,

I think based on your error, euclidean distance will return a float number for any two nd matrix. So de+= is always a float number. std cannot be computed for a single float number.

Could you please provide what library you are using for calculating euclidean distance?

But typically, you can calculate it using torch.dist(output, input, 2). Here is the documentation.

Bests

julioeu99 · May 28, 2020, 7:13pm

Wrong question, sorry. How can i calculate the mean of de?

       ne=ne.detach().cpu().numpy()           
       naa=naa.detach().cpu().numpy()           
       de=0
       for ne, naa in zip(ne, naa):          
         de+=distance.euclidean(naa,ne)

Nikronic · May 28, 2020, 7:32pm

I think torch.dist(ne, naa, 2) will do the trick. No need to convert it to numpy or transfer it to CPU.

julioeu99 · May 28, 2020, 7:39pm

I need to do what i did, for other things in the program. What i want, is just to calculate the mean of the values given by variable: de
I know that it is very easy, but can you explain me, how can i do that?

Nikronic · May 28, 2020, 8:01pm

Could you please print the shape of ne and naa?

julioeu99 · May 28, 2020, 8:05pm

Of course. Both shape: (784,)

Nikronic · May 28, 2020, 8:13pm

With this shape of inputs, de only can be a single float number. Mean of a single number is itself.
By below code

You are adding 768 distances which will result in a single number. Are you looking for having a list of 768 separate distances then calculating mean for them?

julioeu99 · May 28, 2020, 8:22pm

This loop is always printing the distance values, and then end, when finished iterating all data.

      ne=ne.detach().cpu().numpy()           
      naa=naa.detach().cpu().numpy()           
      de=0
      for ne, naa in zip(ne, naa):          
        de+=distance.euclidean(naa,ne)
      print(de/size_batch)

I want to calculate the mean of all printed values, it is clear now what i want to do?
The print, prints values like: 3.21412
3.124124
3.112412
3.42121
And i want to calculate the mean of all this printed values.

Nikronic · May 28, 2020, 8:42pm

In this case you just doing fine. Just use different variables for looping:

ne = np.randn(768,)
naa = np.randn(768,)

de=np.zeros((1, ))
for ne_, naa_ in zip(ne, naa):          
    de+=np.sqrt((ne_ - naa_)**2)
de/ne.shape[0]

But using loops are not efficient. The more optimized way to this using numpy:

np.sum(np.sqrt((ne - naa)**2)) / ne.shape[0]

Sklearn:

from sklearn.metrics.pairwise import euclidean_distances
np.sum(np.diag(euclidean_distances(ne.reshape(-1, 1), naa.reshape(-1, 1))))/ne.shape[0]

Sorry that I did not understand your question very well.

Bests

julioeu99 · May 28, 2020, 10:14pm

Thank you, for the detailed explaination you helped me a lot. Just for curiosity, in my previous loop, how can i calculate the mean?