Hello,
OpenAI has released GPU kernels for block-sparse operations, and wrappers in TensorFlow.
https://github.com/openai/blocksparse
Looks like something not to complicated to port to PyTorch. I’ve never played with GPU kernels, never contributed to PyTorch. Anybody interested to do it? I might do it eventually if nobody’s interested.
Cheers!