I am reading this article about meta tensor helps speed up billion-parameter models. I am very interested in the notion “meta tensor” but I couldn’t find much documentation about it.
So as I understand, the meta tensor is a tensor without content, and it only has meta data, for example, its shape. So when two meta tensors are combined together by an operator, the result tensor is also a meta tensor with an inferred shape. However, I cannot find any more usage of meta tensor beyond shape inferring. Therefore, I would like to know:
- Is there any other usage of meta tensor?
- In the article I mentioned above, how exactly does the meta tensor help speed up the large models? What is the role meta tensor played with DeepSpeed?