It requires Long tensor as input which has significant larger memory foot-print than a fp16 or fp32 without actually being needed or used. This leads to optimization problems when training larger models.
- Are there workarounds in not using Long tensors?
- Are there any plans to extend support to other int precision types?
- Would there be reasons to not extend support to other int precision types?