Simple cumulative sum of bag of words in sparse format?

I’m trying to compute something which I think should be fairly simple but is apparently unobvious except in the inefficient explicit way.

I have a sequence of tokens (from something other than a natural language).

I wish to make an efficient sparse encoding of the cumulative count, bag of words, for a sliding (or non-sliding if necessary but cumulative), for example c_i where c_i is the number of times the token index i occurred previously at this position.

I want to apply nonlinear-but zero preserving functions to this, such as log(c+1) maintaining sparsity and then that goes into a predictive model (which is dense after dimensionality reduction from the large word space).

There is a batch dimension as well of course.

This seems like an elementary problem that people have addressed in NLP or sequence prediction but I haven’t seen a clear answer. Conv1d does not support sparse tensors.

What are the preferred ways to solve this?