Triplet Loss OOM CUDA (A100 + Small Model)

Besides the input data and the model’s parameters and buffers the intermediate forward activations could use a lot of memory depending on the model architecture. I don’t know how you’ve measured the memory usage, but this post explains it in more detail for a ResNet.

1 Like