I’m training AlexNet on the CIFAR dataset on a really beefy HPC node equipped with 256 cores and 8 AMD Instinct MI250X GPUs. Time per epoch is about five seconds. According to calflops one forward+backward pass (input is resized to 227x277) takes 4.3 GFLOPS. There are 50k images in the training set so it comes out to roughly 43 TFLOPS/s.
Is this FAST? Given the hardware I have at my disposal should I be satisfied? I have to train lots of models with different hyper parameter settings from scratch so time I invest in improving the training setup is well-worth it.