I’m not sure I understand the concern. Since you are creating the idx
in both cases randomly, a different result would be expected.
If you rerun the code, you should see different results unless you seed the code.
Assuming you don’t have any randomness in the model (e.g. dropout) or any layers, which are using the shuffled dimension in a sequential manner, then your assumption might be correct. To verify it you could create a single ordered and shuffled batch, calculate the loss as well as the gradients, and compare both approaches.