Cuda out of memory error during forward pass

Yes, Autograd will save the computation graphs, if you sum the losses (or store the references to those graphs in any other way) until a backward operation is performed.
To accumulate gradients you could take a look at this post, which explains different approaches and their computation as well as memory usage.