Quenstion about one trick used when the GPU memory overflow

ptrblck · November 12, 2018, 1:55pm

That should generally work.
Note that you might want to scale the gradients as they are accumulated by default.
Here is a good explanation with some examples.