Normalize the weight on the fly. One way to do this is to derive nn.Conv2d. In the forward pass, do weight = self.weight / ... and pass the local weight to the torch.nn.functional.conv2d. This would not normalize the stored parameters (but you could do that after training and then replace your custom conv layer with the standard one).
Note that the two strategies are fundamentally different and it you’d need to find out which gives you better results.