Any solution for adjusting Batch Norm stats when we use Grad Accumulation .?
In almost all grad accumulation solution Batch Norm stats is the issue,it continues to be over reduced batch size compared to what it should be over one as per grad accumulation .