Reconstruction accuracy of one-hot encoded vectors

For a project I have been working on, I have built a Conditional Variational Autoencoder in PyTorch, trying to reconstruct one-hot encoded vectors representing chemical compositions from sampled latent representations. What I’d like to do now is assessing the accuracy of reconstructions by considering a test set.

So, what I have is an input of one-hot encoded vectors test_set with shape (779,103) that is a sparse representation of my input data, for example

test_set[0]

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
        0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

and then I have my reconstructions of shape (779,103) that originally look like this:

tensor([[1.9423e-05, 1.1591e-07, 1.3582e-02,  ..., 8.8928e-08, 7.3037e-08,
         6.5609e-08],
        [2.6044e-03, 7.3856e-07, 1.2225e-02,  ..., 8.0583e-07, 6.3198e-07,
         6.5411e-07],
        [4.7741e-08, 1.0412e-07, 7.5642e-03,  ..., 9.3880e-08, 1.0886e-07,
         8.2602e-08],
        ...,
        [1.6456e-05, 3.7687e-07, 1.4820e-02,  ..., 3.2218e-07, 2.5968e-07,
         2.4550e-07],
        [1.8924e-02, 4.3774e-07, 2.4112e-02,  ..., 3.3623e-07, 3.9053e-07,
         2.5162e-07],
        [5.8450e-06, 5.3102e-08, 1.2835e-02,  ..., 3.7761e-08, 3.8559e-08,
         3.3401e-08]])

but what I have done has been putting to zero everything below a fixed threshold (0.15 ) ending up with something like

tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.2581, 0.0000, 0.1610, 0.9947, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000, 0.0000, 0.0000, 0.0000])

(Question: how could I decide a threshold that might help distinguishing meaningful coordinates from noise?)

Now, after having applied some threshold adn set the remaining non-zero entries equal to one, how could I evaluate the accuracy of my reconstructions on the overall test set?