Yes, indeed, it destroys the chain. I guess in many contexts this is not a problem.
I think a context where you want to optimize on the mask is more likely to be some kind of RL problem where you have a discrete action space.
Variable.nonzero is not implemented yet as discussed in the link below. However, if it was, I wonder how the backward would be implemented, since it outputs indices…