@ptrblck thanks for pointing me right direction , using your code there was no changes in running stats in my code as well and everything was fine except shuffling valid_dl, i was shuffling valid_dl so target shuffled as well , after that doing prediction and comparing this prediction with non shuffled y_val of train_test_split.
I am still not clear what running stats are for? What it really tells in layman term?