I’m not aware of any issues, which might create unnecessary OOM errors.
If I’m not mistaken, SparseAdam
will lazily compute the the updates as:
In this variant, only moments that show up in the gradient get updated, and
only those portions of the gradient get applied to the parameters.
Could you just run out of memory for a specific input, which uses more entries in your sparse input?