You could try to capture the iteration loss spikes and manually check, which samples were included in the current batch, which might yield more information. E.g. it could be “bad” or “hard” samples, which create a high loss as well as non-optimal gradients so that the following iterations are used to lower the loss again.
However, this is only pure speculation and it would be interesting to see what you can find out about the data.