Does shuffling the samples interfere in training with 1-CCC loss function?

The Concordance Correlation Coefficient is defined as follows:

Captura de pantalla 2023-06-08 a las 18.33.41

The loss function computed between the labels and the outputs of the model is L = 1 - CCC

My question is: if the samples have temporal dependency (in this case they are clips from the same audio which will probably have more similar labels), does it matter whether to shuffle the training set before training? And will the batch size be very critical?