Using PixelCNN over patches

Is it possible to apply a PixelCNN over patches instead of individual pixels?

As applying patch embedding to vision models is becoming more popular, I am curious to know whether the same strategy can be applied to autoregressive models making them faster. To the best of my knowledge it should be possible to do as long as non-overlapping patches are taken. Am I missing something? What are the possible caveats to this approach?