RuntimeError: reduce failed to get memory buffer: out of memory - After 30,000 iterations

Are you storing some tensors attached to the computation graph?
If so, this will increase your memory usage for each storage.

Also, you could (possibly) reduce the memory footprint by using functions for training and evaluation, since Python uses function scoping as described here.