Any plans to improve Long tensor arithmetic?

In many functions, e.g. CrossEntropyLoss — PyTorch 1.7.1 documentation and Embedding — PyTorch 1.7.1 documentation

It requires Long tensor as input which has significant larger memory foot-print than a fp16 or fp32 without actually being needed or used. This leads to optimization problems when training larger models.

  1. Are there workarounds in not using Long tensors?
  2. Are there any plans to extend support to other int precision types?
  3. Would there be reasons to not extend support to other int precision types?

Using float16 might be too limiting, as the precision limitations for integer values start at 2049 as described here.
float32 might be suitable, but it also depends on the overall gains.
While it’s true that LongTensors would need more memory, you are often not dealing with huge target tensors, so the memory savings might be small.
E.g. how large is the target tensor for nn.CrossEntropyLoss in your case and how much memory would you save if you could use e.g. int32?