BatchNormalization vs. LayerNormalization

From my understanding, batch normalization means normalize over all batches for each channel each time where as layer normalization is being used to normalize over all channels. But I got confused why layer normalization is more being used in language model whereas batch is more in CNN. Because in my case, I wanna to extract correlated features between the channels of my dataset, I am not sure which type of normalization I should be using. From my understanding, it makes more sense to use layer normalization, since I can normalize over all channels instead of all batches which I am trying to avoid (do not want the model to be confused that there are certain relationship of the sequences in a batch).