Quenstion about one trick used when the GPU memory overflow

That should generally work.
Note that you might want to scale the gradients as they are accumulated by default.
Here is a good explanation with some examples.