C++ Embedding layer slow grad_zero() and backward() execution for large vocabulary

how can I create sparse embedding layer with pytorch C++ front end?