Triplet Loss OOM CUDA (A100 + Small Model)

ptrblck · September 6, 2023, 2:42pm

Besides the input data and the model’s parameters and buffers the intermediate forward activations could use a lot of memory depending on the model architecture. I don’t know how you’ve measured the memory usage, but this post explains it in more detail for a ResNet.