Comparing GPU speeds

I am trying to train a captioning model in a multi-GPU setting using PyTorch. I wanted to know what GPU could give me the least training time. I have access to 4xTesla M60 GPUs. For 512 videos, 1 epoch on M60 took around 26 minutes. My dataset is much larger (~6k videos with each video having 20 captions).
When I researched about M60, I found that these are old GPUs and that Nvidia has come up with better architectures since then. I rented GPUs on vast.ai to try some of newer ones.

  • Training on 256 videos. 1 epoch. 4 x RTX 2080Ti . Time:26 minutes. This turned out to be twice as worse than M60.
  • Then someone pointed me to the following issue https://github.com/pytorch/pytorch/issues/22961.
    I tried training on 4xGTX 1080Ti. 1 epoch 256 videos. Time:14 minutes. This certainly worked better than 2080Ti.

I wanted to know how these GPUs actually compare with each other. Is M60 superior to all in terms of speed? The TFLOPS of the newer GPUs is better than M60. Is that parameter enough to compare training time? Or are there more parameters that should be considered.
Thank you.