Difference in execution time between similar hardwares

I have 2 computers, these are their configurations:

  1. 16 GB RAM, GeForce GTX 1070, i7-8700, Debian testing, Python 3.8.3;
  2. 32 GB RAM, GeForce GTX 1070, i7-7700K, Manjaro Linux 20.1, Python 3.8.

I run the same experiment in both machines, using PyTorch 1.6.0, CUDA 10.2 and NVIDIA drivers 440.100. In the first machine, one experiment takes 11 hours. In the second, it takes 48 hours.

I am unable to find what’s so significantly different between the two computers that cause such a large difference in execution time. Does anyone have any knowledge about what can cause this?


From checking the CPU here it looks like one machine has a higher core count and turbo boost frequency compared to the other. This can speed things up a bit.
Also what kind of drive are you using on both machines (if you load your data on the fly from a drive)?

Indeed, the slower machine is using a hard drive, while the faster one uses an M.2 drive. However, my data comes from a simulator and is not loaded from any disks, so I don’t believe that is where the slowdown comes from.

Also, while there is indeed a difference in the CPUs, their usage is never maxed out, so I don’t really know if that’s the bottleneck. In any case, I run the same experiments in a server with an Intel Xeon Gold 5118 and a Tesla V100 and the same experiment runs in 28 hours.

I will try and profile my code in some of these machines to see if I find out what’s going on.

1 Like

The problem was that I am using a library for logging my data (Weights & Biases) that I thought worked fully asynchronously and online, but it actually writes the data to a file on disk before logging it to their servers. This created lots of accesses to the hard drive and, in the computers that were not equipped with SSD, slowed the process down quite a bit. Mystery solved.

1 Like

Thanks for the update!
Good to know that it was nothing bad on our side :slight_smile: