What is the usage of meta tensor and how it helps large models reduce memory footprint?


I am reading this article about meta tensor helps speed up billion-parameter models. I am very interested in the notion “meta tensor” but I couldn’t find much documentation about it.

So as I understand, the meta tensor is a tensor without content, and it only has meta data, for example, its shape. So when two meta tensors are combined together by an operator, the result tensor is also a meta tensor with an inferred shape. However, I cannot find any more usage of meta tensor beyond shape inferring. Therefore, I would like to know:

  1. Is there any other usage of meta tensor?
  2. In the article I mentioned above, how exactly does the meta tensor help speed up the large models? What is the role meta tensor played with DeepSpeed?