Video card parameter that leads to faster network learning

I’d like to know the following:
If we have two different video cards(cuda compatible) how to find out which of them will lead quicker network training.

Given that they both are likely to compute the same results, mostly by timing, using %timeit in Jupyter or jost the Python time.time() before and after. The one thing to get right is to do torch.cuda.synchronize() before you start timing and the calculation and after one batch (or whatever) is done.

Now, when you have different amounts of memory and that affects your batch size or one card is fp16-capable and not (e.g. GTX cards are not), you probably want to inlcude that in your estimate.

Best regards


Is amount of FLOPS not playing any role in this?
or cuda cores?

Cuda cores, memory bandwidth, software support, task characteristics. But really it’s complicated enough that people usually do empirical measurements. If you want simple and both cards are still on the market, you could go by $$$.