Imposing sparsity for fast outer product?

I have batches of outer products of very large vectors that I need to sum up in which I only need to know a sparse subset results (covariances I know in advance will be the useful ones). That is, I need to sum the products a[i,j] * b[i,k] for only a predetermined sparse list of [j,k] entries. The natural result would be a sparse matrix with nonzeros in [j,k].

In principle the computation should be cheap: b x s (batch size x sparsity). The operation I want seems straightforward, but I can’t find a way to do it without b x n x n work, and the sparse matrix documentation is … sparse. Is there a clever way in pytorch to achieve the fast product that I seek?