Require gradient only for some tensor elements (others should be held constant)

You can do that, but obviously it will change the results of the computation, if you set some of the elements of embed to zero already in the forward step. It depends on your use case, whether you want the masked elements to contribute to the forward pass or not. Using embed = embed * mask it will set the contributions to zero.

My original use case was different, I had non-zero matrix elements which should contribute to the final result, but not change during training. Hence I used a different approach.