Hi all,

big fan, long time reader, first time poster.

I have a general question regarding checking the accuracy of a model besides looking at the test loss and was hoping someone might have some ideas or references.

I’m using a simple NN as surrogate model to approximate charged particles in an electromagnetic field. My input and output vectors are phase space vectors (x, y, z, v_x, v_y, v_z) at two positions in space and are generated using a more or less conventional particle tracker. After every training epoch, besides looking at the test loss (MSELoss), I would also like to calculate the relative accuracy of every output (mean accuracy of x = ? %) so I’m using the dataloader to compute the mean relative error of batches and finally of the whole dataset.

Since the relative error is something like (x_out - x_pred)/x_out there are cases where x_out is very small or near 0 resulting in some numerical issues after the division. I can pinpoint this problem by setting shuffle=False: this leads to some few batches looking really bad in term of prediction accuracy (the initial dataset is more or less sorted which will lead to specific clusters being near 0) which then skews the overall result.

I’ve been able to ameliorate this by using z-score instead of scaling to max when generating my torch.utils.data.Dataset but the problem is still there to some extent.

Is there a better way to do this besides clipping the outliers or will this even out or iron itself out during later epochs? I feel that there is a simple solution for this but just can’t think of it.

Thanks in advance