In-place versus non in-place operations, which are more efficient?

The PyTorch docs say that in-place operations are less efficient in most cases here: http://pytorch.org/docs/master/autograd.html?highlight=no_grad#in-place-operations-on-tensors

So is this more efficient:

self.var.div_(input.nelement())

Than this?

self.var = self.var.div(input.nelement())

What in-place operations are more efficient than their non in-place counterparts?

So it seems that using the non in-place version shaved off 181+ MiB of GPU memory:

self.var = self.var.div(input.nelement())