RuntimeError: CUDA out of memory. Tried to allocate 1.12 MiB (GPU 0; 11.91 GiB total capacity; 5.52 GiB already allocated; 2.06 MiB free; 184.00 KiB cached)

I’m not aware of any issues, which might create unnecessary OOM errors.
If I’m not mistaken, SparseAdam will lazily compute the the updates as:

In this variant, only moments that show up in the gradient get updated, and
only those portions of the gradient get applied to the parameters.

Could you just run out of memory for a specific input, which uses more entries in your sparse input?

1 Like