-
Depending where the bottleneck in your current system is, ~5 minutes might be expected. I just reran the script on a V100 server (which should be of course a bit more powerful) and it finished in 34s.
-
This post gives you a good overview about potential bottlenecks and their workarounds.
-
This effect is explained e.g. in Revisiting small batch training for deep neural networks, which claims that the best performance is consistently obtained for batch sizes between 2 and 32.