Does PyTorch have more "inplace" ops?

I am currently building memory efficient PyTorch systems. I find a really annoying and pitiful thing - for many ops in PyTorch, I dtk how to ensure that the output is stored at a certain place.

For example:
a = b * c, say b.shape = (1024, 16), c.shape = (16,), then what I want to do is:

  1. a = torch.empty(1024, 16)
  2. let the b * c result be written to the just allocated memory.

A solution might be bc = b * c, a.copy_(bc), but this has extra costs. The demand is beneficial because sometimes I am aware of all ops in the program and I want to make sure that all valuable results are stored one by one in memory while all activations are allocated elsewhere.
There might be certain ways for solving each op. And my following question is, is there a uniform way to solve this? i.e. a method independent from op. For example, a wrapper, used like wrap(mem, func, *arg), to ensure that the final result is written into mem.

Many thanks! And I dtk how to category this post, advice is helpful!