Backprop Through sparse_dense_matmul

See this thread: Exploiting sparsity in batch operations?

torch.smm impleemnts sparse * dense -> sparse, and it assumes your sparsity is strong enough that it’ll help (you’ll need a really sparse tensor in the forward op). Sparse gradients are supported, and are implemented in the Embedding module.