Input size / Batch size

I am training a semantic segmentatşon model with 256256 input. I was using 16 as my batch size and I was getting good results. For an unrelated reason I need to feed my Model with 3232 images, thus I divided my input images to 32*32 tiles and I am using 1024 as my batch size. Model is 2x slower now and accuracy is much worse. What could be the reason?

You could profile your code via the native PyTorch profiler or Nsight Systems to get a clear answer, but based on your description it seems you have increased the data loading and processing workload by quite a bit, which could have created a bottleneck.