In-place versus non in-place operations, which are more efficient?

The PyTorch docs say that in-place operations are less efficient in most cases here:

So is this more efficient:


Than this?

self.var = self.var.div(input.nelement())

What in-place operations are more efficient than their non in-place counterparts?

So it seems that using the non in-place version shaved off 181+ MiB of GPU memory:

self.var = self.var.div(input.nelement())