What happens is that pytorch allocates the intermediate buffers on demand, and frees then as soon as they go out of scope.
Variables keep the history of all the computations that were performed before it.
A common mistake is to do
current_loss += loss, and this will not free the memory because you will be keeping track of the whole history of computations. You should do instead
current_loss += loss.data for example.
Also, if you only want to perform forward pass computations, using
volatile variables will save you a ton of memory.