How to proper save embeddings?

marcomixer99 · April 17, 2023, 12:06am

I have a big-image-encoder and a lot of (more than 200K) high resolution images. I want to save all the image-embeddings. I tried with TensorDataset but when i check if it was the same to load a saved-pre-calculated embedding or forrward the same image i see a little difference in the representation. In particular:
torch.max(saved_emb - same_image_forwarded)
Out[8]: tensor(3.1292e-07)
torch.sum(saved_emb - same_image_forwarded)
Out[9]: tensor(-7.6951e-06)

There is something i’m missing, or i have just a bug in my code and i have to find it.

Thanks in advance and sorry for my english

ptrblck · April 17, 2023, 12:09am

I don’t know how you are computing these embeddings, but I guess these are output features of a model.
If so, you are most likely seeing the expected small errors caused by the limited floating point precision and a different order of operations.
If you need more precision you might need to use a wider type, such as float64, for a performance penalty.

marcomixer99 · April 17, 2023, 12:13am

Thanks for the quick response! Yes, these are output features of a model.
I thought that putting the model in evaluation mode and setting a seed would let me get the same exactly features embeddings.
what do you mean with “a different order of operations”?

marcomixer99 · April 17, 2023, 12:16am

I’m computing these emebeddings because the image-encoder is only my freezed backbone. I just want to fine tune a simple adapetr on it. So i would like to store these feature embeddings to train only the adapter, cause i want the backbone freezed

ptrblck · April 17, 2023, 12:18am

A different order of operations can cause small, but expected, numerical differences as seen e.g. in this small example using sum:

x = torch.randn(100, 100)
s1 = x.sum()
s2 = x.sum(0).sum()
print((s1 - s2).abs().max())
# tensor(1.5259e-05)

If you want to get deterministic results you would have to check the Reproducibility docs.
Note that these small errors are usually harmless and in case your model or use case is sensitive to small errors in the ~1e-5 range you might need to use a wider dtype as already mentioned.

marcomixer99 · April 17, 2023, 12:20am

Perfect! Thank you very much!