BatchNorm2d is often used in ConvNet layers (e.g. in a standard ResNet block, as seen here as BN):
gradient clipping is often used in RNN models (such as LSTM) because the deep recurring structure can cause gradients to blow up
I don’t think these are mutually exclusive. For example, in an OCR model built as a ConvNet stacked on top of a RNN, you may use BatchNorm2d in the ConvNet and gradient clipping for the sake of the RNN.
Hope this helps! (and curious if others have diverging opinions)