Discrepancy in RMSE Calculation Results between Functions

ababey · August 31, 2023, 9:01am

Hello all,

I’m encountering an issue in my PyTorch code involving RMSE calculations. I have two functions that calculate the RMSE between corresponding feature maps in two lists. Both functions should do the same ; compute the RMSE for a list of 5 tensors. The difference is that in the second one I perform manually the iteration over the batch size.

The strange part is that while both functions seem logically equivalent and work correctly, they yield significantly different RMSE values.

Here’s a brief overview of my code and the functions with print of the loss they compute:

def calculate_rmse(A, B):
    diff_squared = (A - B)**2
    mean_squared_error = torch.mean(diff_squared)
    rmse = torch.sqrt(mean_squared_error)
    return rmse



def calculate_total_rmse_list(original_feature_maps_list, reconstructed_feature_maps_list):
    total_rmse = 0.0
    for i in range(len(original_feature_maps_list)):
        rmse = calculate_rmse(original_feature_maps_list[i], reconstructed_feature_maps_list[I])
        total_rmse += rmse
    return total_rmse
Epoch 1/200, Loss: 6.661827330362229

def calculate_total_rmse_list2(original_feature_maps_list, reconstructed_feature_maps_list, batch_size):
    total_rmse = 0.0
    for i in range(len(original_feature_maps_list)):
        original_feature_map = original_feature_maps_list[I]
        reconstructed_feature_maps = reconstructed_feature_maps_list[I]
        rmse_batch = 0.0
        for j in range(batch_size):
            rmse = calculate_rmse(original_feature_map[j,:,:,:], reconstructed_feature_maps[j,:,:,:])
            rmse_batch += rmse
        total_rmse += rmse_batch
    return total_rmse

Epoch 1/200, Loss: 418.642820085798

I do not understand why there is a difference between the two function, same data, same code. I try the second function to debug the first one, but it gave me more issue to understand. It seems that pytorch do not handle a 4 dim tensors difference but I know it can.

Can someone help to understand why there is a difference?Preformatted text

KFrank · August 31, 2023, 5:57pm

Hi Ababey!

ababey:

def calculate_total_rmse_list(original_feature_maps_list, reconstructed_feature_maps_list):
    total_rmse = 0.0
    for i in range(len(original_feature_maps_list)):
        rmse = calculate_rmse(original_feature_maps_list[i], reconstructed_feature_maps_list[I])
        total_rmse += rmse
    return total_rmse
Epoch 1/200, Loss: 6.661827330362229

def calculate_total_rmse_list2(original_feature_maps_list, reconstructed_feature_maps_list, batch_size):
    total_rmse = 0.0
    for i in range(len(original_feature_maps_list)):
        original_feature_map = original_feature_maps_list[I]
        reconstructed_feature_maps = reconstructed_feature_maps_list[I]
        rmse_batch = 0.0
        for j in range(batch_size):
            rmse = calculate_rmse(original_feature_map[j,:,:,:], reconstructed_feature_maps[j,:,:,:])
            rmse_batch += rmse
        total_rmse += rmse_batch
    return total_rmse

Epoch 1/200, Loss: 418.642820085798

You have three things going on here:

First uppercase’I’ isn’t defined anywhere in the code you posted. You
might have a typo and meant lowercase ‘i’. In any event, an inconsistent
value for I would definitely mess things up.

Second, when you compute calculate_rmse() in your first version for
original_feature_maps_list[i], torch.mean() divides by the number
of elements of original_feature_maps_list[i]. In the second version,
torch.mean() only divides by the smaller number of elements in
original_feature_map[j,:,:,:].

Finally, your second version sums up rmse to form rmse_batch after the
square root has been taken, while in the first version the squared errors of
the batch elements are summed up before taking the square root, which is
not equivalent.

If you’re still having issues after sorting through the above, please post a
fully-self-contained (using hard-coded or random data), runnable script that
reproduces your issue, together the results you get when you run that script.

Good luck!

K. Frank