I’m trying to compute something which I think should be fairly simple but is apparently unobvious except in the inefficient explicit way.
I have a sequence of tokens (from something other than a natural language).
I wish to make an efficient sparse encoding of the cumulative count, bag of words, for a sliding (or non-sliding if necessary but cumulative), for example c_i where c_i is the number of times the token index i occurred previously at this position.
I want to apply nonlinear-but zero preserving functions to this, such as log(c+1) maintaining sparsity and then that goes into a predictive model (which is dense after dimensionality reduction from the large word space).
There is a batch dimension as well of course.
This seems like an elementary problem that people have addressed in NLP or sequence prediction but I haven’t seen a clear answer. Conv1d does not support sparse tensors.
What are the preferred ways to solve this?